setting grad None after training to avoid memory leak #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
during training I've observed a memory leak, when training several models after another. I've used your example code from the Readme to create an example, see below. I'm using Python 3.7 and PyTorch 1.13.1. Per training run I observe an increase of about 10MiB concerning memory usage. This is especially an issue when training larger models, resulting in an out of memory error.
The memory leak is discussed in pytorch/pytorch#82528. In your code there is already a
begin
method in theOptimizable
class which sets theparam.grad
to None. However, it seems there is also a need to setparam.grad=None
at the end of training a model to avoid memory leakage over several training runs. Hence, I suggest to add an end method which is called at the end of each training run.