The optimistic principle for online planning in Markov decision processes
author: RĂ©mi Munos,
SequeL lab, INRIA Lille - Nord Europe
published: May 28, 2013, recorded: September 2012, views: 2632
published: May 28, 2013, recorded: September 2012, views: 2632
Related content
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
Given an initial state, what is the best possible action that can be returned by a planning algorithm that is given a finite numerical budget (e.g. number of calls to a model of the state-transition and reward functions). We investigate optimistic strategies and provide regret bounds in terms of a new measure of the complexity of the planning problem.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !
Write your own review or comment: