Location: EU Supported » PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning » European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Bled 2009 » Workshops

event thumbnail image

Solving Deterministic Policy (PO)MDPs using

author: Thomas Furmston, Department of Computer Science, University College London
published: Oct. 20, 2009, recorded: September 2009, views: 3211

Switch off the lights

Slides

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Lecture popularity: You need to login to cast your vote.

Bibliography

Description

The viewpoint of solving Markov Decision Processes and their partially observable extension refers to nding policies that max- imise the expected reward. We follow the rephrasing of this problem as learning in a related probabilistic model. Our trans-dimensional distri- bution formulation obtains equivalent results to previous work in the innite horizon case and also rigorously handles the nite horizon case without discounting. In contrast to previous expositions, our framework elides auxiliary variables, simplifying the algorithm development. For any MDP the optimal policy is deterministic, meaning that this important case needs to be dealt with explicitly. Whilst this case has been discussed by previous authors, their treatment has not been formally equivalent to an EM algorithm, but rather based on a xed point iteration analogous to policy iteration. In contrast we derive a true EM approach for this case and show that this has a signicantly faster convergence rate than non-deterministic EM. Our approach extends naturally to the POMDP case as well. In the special case of deterministic environments, standard EM algorithms break down and we show how this can be addressed us- ing a convex combination of the original deterministic environment and a ctitious stochastic `antifreeze' environment.

See Also:

Download slides icon Download slides: ecmlpkdd09_furmston_sdpema_01.pdf (257.7 KB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

Comment:
Name:
Email address:
URL:

make sure you have javascript enabled or clear this field: