Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding Gradient Inversion #37

Open
richielo opened this issue Mar 27, 2019 · 6 comments
Open

Understanding Gradient Inversion #37

richielo opened this issue Mar 27, 2019 · 6 comments

Comments

@richielo
Copy link

Sorry, if this appears to be a stupid question. I am trying to implement gradient inversion using PyTorch based on the paper but I would like to ask for some clarifications. Is the inversion done on all the layers? or is it just done on the last layer? If it's the former case, we would have to keep the output of each layer

Thanks a lot for your help in advance

@wbwatkinson
Copy link

The Inverted Gradients technique is applied on the back-propagation gradients output of the critic before they are applied to the actor. The relevant code is in dqn.cpp, lines 922-965 (

DLOG(INFO) << " [Backwards] " << critic_net_->name();
).

I'm implementing a similar algorithm in Keras. I'll be interested in seeing your PyTorch implementation when you have it up and running.

@liyuyuc
Copy link

liyuyuc commented May 31, 2019

The Inverted Gradients technique is applied on the back-propagation gradients output of the critic before they are applied to the actor. The relevant code is in dqn.cpp, lines 922-965 (

DLOG(INFO) << " [Backwards] " << critic_net_->name();

).
I'm implementing a similar algorithm in Keras. I'll be interested in seeing your PyTorch implementation when you have it up and running.

hi,
can you share your implementaion??
if can't, could you share your experiment result??

@wbwatkinson
Copy link

This code is very raw, and it there are some problems with the learning, which is slow and unstable. Specifically, the 4 discrete action values (used to probabilistically select which of the four actions will be executed) eventually all move close to 1.0 and fluctuate slightly, making it too easy for the agent to select the wrong action.

The code is available in this repository:
https://github.com/wbwatkinson/ddpg-hfo-python

And you should be looking at lines 488-507 (https://github.com/wbwatkinson/ddpg-hfo-python/blob/6989b849eb9b90e03fbecaf49463a11505ab92bf/src/ddpg.py#L488). I'm a bit new to Python, so no guarantees that this is pythonic. Also, as mentioned I have at least one error somewhere in the code, but I don't think it is in the inverting gradients algorithm. I welcome any feedback you or anyone has.

@wbwatkinson
Copy link

I had an error in the code... corrected now (https://github.com/wbwatkinson/ddpg-hfo-python). Unless there are other questions about the inverting gradients algorithm, I recommend closing this.

@liyuyuc
Copy link

liyuyuc commented Jun 6, 2019

I had an error in the code... corrected now (https://github.com/wbwatkinson/ddpg-hfo-python). Unless there are other questions about the inverting gradients algorithm, I recommend closing this.

hi, can you tell me which one you correct??

@wbwatkinson
Copy link

I think it would be best to discuss the specifics of the Python code in the other repository. That said, I made two changes that stabilized learning.

  1. Correction to inverting gradients algorithm (I had a sign error in the calculation):
    wbwatkinson/ddpg-hfo-python@357170d#r33838064
  2. Added gradient clipping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants