24th Annual Conference on Learning Theory (COLT), Budapest 2011

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

author: Gábor Bartók, Department of Computer Science, ETH Zurich
published: Aug. 2, 2011, recorded: July 2011, views: 3795

Slides

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Lecture popularity: You need to login to cast your vote.

Description

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero,Ɵ(√T), Ɵ(T2/3) or Ɵ(T). We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

Comment:
Name:
Email address:
URL:

make sure you have javascript enabled or clear this field: