-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using dynamic growth & pruning? #20
Comments
Yes, theoretically you can call the pruning function at each mini-batch iteration. If you look at the code, it is currently only called after the end of each epoch. You just need to put this function into the training loop to achieve pruning/growth at each step. Another option that is already baked in, but comes with predefined static behavior is setting the prune_every_k_steps variable. Setting it to 1 would execute the prune/regrowth cycle with every mini-batch. |
Thank you for your reply! I'm having a bit of trouble understanding the working and objective of the calc_growth_redistribution method in
|
Thanks for your comment. The method determines the redistribution of weights. There is the problem of what you do if weights are redistributed to layers that are already full (and cannot regrow weights) or if more weights are regrown that a layer can fit. I could keep track of these, but I found it easier and more general to anneal the redistribution over time (1000 iterations). The residual is the overflow from full layers that are too full, and it is redistributed up to 1000 iterations. It isn't easy to redistribute the weights in some cases, and the annealing procedure does not converge in 1000 iterations. In this case, the best solution after 1000 iterations is taken, but this solution might not be 100% proportional to the metric used to determine the redistribution fractions. I hope this makes it a bit clear, it is definitely a confusion function, and I see that I forgot to clean up some artifacts as you have pointed out. Let me know if you have more questions. |
Hi Tim, Thanks again for your reply! I have a few more questions :)
(probably be
Below is a comparison of the existing snippet vs RigL's implementation. Since there is no check on capacity, the actual sparsity is lower than the intended one. In the below output, intended density was 0.2 (or 80% sparsity).
Here's the source to produce this output. Adding a threshold (something like |
Great catch! Would you mind submitting a pull request for this? I feel like you are able to quickly pinpoint and fix this issue. |
Sure, I would be happy to contribute. Would you prefer adding RigL's implementation of ERK for this? It does better than trying to tune |
We used @TimDettmers's sparselearning to base our code for RigL-reproducibility. Our code has also deviated significantly since then, but I could patch in the ERK initialisation change if its still welcome. |
Hi, is it possible to use dynamic growth and pruning currently by just updating the masks each step?
I'm looking to implement something like RigL (Evci et al. 2020).
The text was updated successfully, but these errors were encountered: