The Difference in Temporal Difference Learning #39

nikolay-apanasov · 2024-12-27T00:14:51Z

Thank you for your article. I think it provides an incredibly intuitive perspective about these different algorithms for value learning. Nevertheless, I think there is one aspect that could be improved. Although you define the "update toward" operator in a footnote, I think the use of that operator w.r.t.\ TD Learning is confusing. In particular, as defined on page(120) of Sutton and Barto, the TD(0) update is

    V(S_t) \leftarrow V(S_t) + \alpha [ R_{t+1} + \gamma V(S_{t+1}) - V(S_t) ]

The quantity in the brackets is defined by Sutton and Barto as the TD error, and from an intuitive perspective, I think it really explains the difference in Temporal-Difference Learning. Your notation makes explicit the target of the update, and the distinction between the targets of TD(0) and Monte Carlo really is essential, but I think the TD error is just as essential to understanding the algorithm.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Difference in Temporal Difference Learning #39

The Difference in Temporal Difference Learning #39

nikolay-apanasov commented Dec 27, 2024

The Difference in Temporal Difference Learning #39

The Difference in Temporal Difference Learning #39

Comments

nikolay-apanasov commented Dec 27, 2024