-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework garage.torch.optimizers #2177
base: master
Are you sure you want to change the base?
Conversation
This change does not yet pass tests, but is 90% complete. |
Can you add a little bit more explanation for the design here? I'm concerned about using an ADT as the blanket input to policies, which makes the interface pretty complicated even in the simplest use cases. |
The core motivation here is to provide a way for recurrent and non-recurrent policies to share the same API at optimization time. This PR only adds the bare minimum fields needed for recurrent policies to have reasonable |
cnn_output = self._cnn_module(observations) | ||
mlp_output = self._mlp_module(cnn_output)[0] | ||
logits = torch.softmax(mlp_output, axis=1) | ||
dist = torch.probability.Categorical(logits=logits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be torch.distributions.Categorical
?
089c20f
to
017274f
Compare
017274f
to
df3a137
Compare
WIP torch optimizer refactor WIP torch optimizer refactor WIP
No description provided.