-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
per_sample gradient is None but grad is populated #578
Comments
Thanks for raising this issue. The reason is that Opacus computes grad_samples using "hooks", so it only works for standard layers. You can pass |
Hi. I was using the |
It should also work with |
Exact same error message. No difference. I tried with both |
Hi, I have a similar error. Was this issue resolved @anirban-nath ? |
@RobRomijnders feel free to share your code here for us to better help you. |
I have a particular LayerNorm function in my code because of which I am not able to successfully run Opacus in my code. This LayerNorm function function is defined just like 3 - 4 others in my code and is used in 2 places. When I execute loss.backward(), the grad of the layer function is populated but per_sample grad isn't, which leads Opacus to throw the error "Per sample gradient is not initialized. Not updated in backward pass?"
Under what circumstances is this possible?
PS: This is how the norm is defined
decoder_norm = nn.LayerNorm(d_model) self.decoder = TransformerDecoder(decoder_layer, num_decoder_layers, decoder_norm, return_intermediate=return_intermediate_dec)
This is how it is used. The usages are shown with comments beside them
`class TransformerDecoder(nn.Module):
The text was updated successfully, but these errors were encountered: