Batched Bandit Problems
author: Philippe Rigollet,
Department of Mathematics, Massachusetts Institute of Technology, MIT
published: Aug. 20, 2015, recorded: July 2015, views: 1707
published: Aug. 20, 2015, recorded: July 2015, views: 1707
Slides
Related content
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !
Write your own review or comment: