Autonomous Exploration in Reinforcement Learning

author: Peter Auer, Chair for Information Technology, Montanuniversität Leoben
published: Jan. 25, 2012,   recorded: December 2011,   views: 4399
Categories

Slides

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Delicious Bibliography

Description

One of the striking differences between current reinforcement learning algorithms and early human learning is that animals and infants appear to explore their environments with autonomous purpose, in a manner appropriate to their current level of skills. For analysing such autonomous exploration theoretically, an evaluation criterion is required to compare exploration algorithms. Unfortunately, no commonly agreed evaluation criterion has been established yet. As one possible criterion, we consider in this work the navigation skill of a learning agent after a number of exploration steps. In particular, we consider how many exploration steps are required, until the agent has learned reliable policies for reaching all states in a certain distance from a start state. (Related but more general objectives are also of interest.)

While this learning problem can be addressed in a straightforward manner for finite MDPs, it becomes much more interesting for potentially infinite (but discrete) MDPs. For infinite MDPs we can analyse how the learning agent increases its navigation skill for reaching more distant states, as the exploration time increases. We show that an optimistic exploration strategy learns reliable policies when the number of exploration steps is linear in the number of reachable states and in the number of actions. The number of reachable states is not known to the algorithm, but the algorithm adapts to this number.

See Also:

Download slides icon Download slides: nipsworkshops2011_auer_learning_01.pdf (523.0 KB)


Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Stavros, March 11, 2012 at 1:37 p.m.:

Awesome video for unsupervised learning with the baby :)

Write your own review or comment:

make sure you have javascript enabled or clear this field: