-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SpinQuant pass #1557
base: main
Are you sure you want to change the base?
Add SpinQuant pass #1557
Conversation
5e5658e
to
2da4605
Compare
360e955
to
827cd4f
Compare
2da4605
to
2d14fa9
Compare
827cd4f
to
6c2cfe2
Compare
28222ad
to
7d4169d
Compare
7d4169d
to
706371d
Compare
|
||
# optimizer | ||
optimizer = SGDG( | ||
rotation_params, lr=training_args.learning_rate, weight_decay=training_args.weight_decay, stiefel=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is stiefel always true here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is required to be True to do the cayley sgd for orthagonal matrices. without it, it behaves the same as normal sgd. original implementation here https://github.com/facebookresearch/SpinQuant/blob/44dbc26056ee9e319dd8ce24bfbf7203785f5c77/optimize_rotation.py#L109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why the stiefel default is False, and it also implements the case when stiefel is False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the reason but yes, it implements the False case too.
I got it from https://github.com/facebookresearch/SpinQuant/blob/main/train_utils/optimizer.py#L57 which was in based on the torch sgd implementation https://github.com/pytorch/pytorch/blob/main/torch/optim/sgd.py#L26.
Maybe they wanted to have both modes be available in the same optimizer for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, if we don't (and won't) support the other case, we should remove those dead codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good! I will remove the False case and simplify the optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing this, but the options are more tied than I expected. There are if/else conditions that involve stiefel=true/else + related parameters. I cannot verify the correctness of a modified optimizer so decided to keep it as is. Even the original implementation of spinquant copied the optimizer directly from the source code of the paper that introduced the algorithm.
85eac59
to
e0a7806
Compare
Describe your changes
Add a new pass do weight rotation using SpinQuant.
Checklist before requesting a review
lintrunner -a
(Optional) Issue link