Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation

author: Richard S. Sutton, Department of Computing Science, University of Alberta
published: Sept. 17, 2009,   recorded: June 2009,   views: 7406
Categories

See Also:

Download slides icon Download slides: icml09_sutton_fgdm_01.pdf (1.9 MB)


Help icon Streaming Video Help

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
  Delicious Bibliography

Description

Sutton, Szepesvari and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. Although their gradient temporal difference (GTD) algorithm converges reliably, it can be very slow compared to conventional linear TD (on on-policy problems where TD is convergent), calling into question its practical utility. In this paper we introduce two new related algorithms with better convergence rates. The first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). The second new algorithm, linear TD with gradient correction, or TDC, uses the same update rule as conventional TD except for an additional term which is initially zero. In our experiments on small test problems and in a Computer Go application with a million features, the learning rate of this algorithm was comparable to that of conventional TD. This algorithm appears to extend linear TD to off-policy learning with no penalty in performance while only doubling computational requirements.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 GreatShoot, March 24, 2013 at 9:58 p.m.:

He has a very beautiful red shirt, we saw that at first time. Still wondering why the cameraman stay all the time focused on the lecturer? Like if it's a political lecture where there is no need to look at any complex formula and representation.

Write your own review or comment:

make sure you have javascript enabled or clear this field: