Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Difference in Temporal Difference Learning #39

Open
nikolay-apanasov opened this issue Dec 27, 2024 · 0 comments
Open

The Difference in Temporal Difference Learning #39

nikolay-apanasov opened this issue Dec 27, 2024 · 0 comments

Comments

@nikolay-apanasov
Copy link

Thank you for your article. I think it provides an incredibly intuitive perspective about these different algorithms for value learning. Nevertheless, I think there is one aspect that could be improved. Although you define the "update toward" operator in a footnote, I think the use of that operator w.r.t.\ TD Learning is confusing. In particular, as defined on page(120) of Sutton and Barto, the TD(0) update is

    V(S_t) \leftarrow V(S_t) + \alpha [ R_{t+1} + \gamma V(S_{t+1}) - V(S_t) ]

The quantity in the brackets is defined by Sutton and Barto as the TD error, and from an intuitive perspective, I think it really explains the difference in Temporal-Difference Learning. Your notation makes explicit the target of the update, and the distinction between the targets of TD(0) and Monte Carlo really is essential, but I think the TD error is just as essential to understanding the algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant