You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your article. I think it provides an incredibly intuitive perspective about these different algorithms for value learning. Nevertheless, I think there is one aspect that could be improved. Although you define the "update toward" operator in a footnote, I think the use of that operator w.r.t.\ TD Learning is confusing. In particular, as defined on page(120) of Sutton and Barto, the TD(0) update is
The quantity in the brackets is defined by Sutton and Barto as the TD error, and from an intuitive perspective, I think it really explains the difference in Temporal-Difference Learning. Your notation makes explicit the target of the update, and the distinction between the targets of TD(0) and Monte Carlo really is essential, but I think the TD error is just as essential to understanding the algorithm.
The text was updated successfully, but these errors were encountered:
Thank you for your article. I think it provides an incredibly intuitive perspective about these different algorithms for value learning. Nevertheless, I think there is one aspect that could be improved. Although you define the "update toward" operator in a footnote, I think the use of that operator w.r.t.\ TD Learning is confusing. In particular, as defined on page(120) of Sutton and Barto, the TD(0) update is
The quantity in the brackets is defined by Sutton and Barto as the TD error, and from an intuitive perspective, I think it really explains the difference in Temporal-Difference Learning. Your notation makes explicit the target of the update, and the distinction between the targets of TD(0) and Monte Carlo really is essential, but I think the TD error is just as essential to understanding the algorithm.
The text was updated successfully, but these errors were encountered: