Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpinQuant pass #1557

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add SpinQuant pass #1557

wants to merge 1 commit into from

Conversation

jambayk
Copy link
Contributor

@jambayk jambayk commented Jan 17, 2025

Describe your changes

Add a new pass do weight rotation using SpinQuant.

  • Similar to QuaRot, this pass also only performs offline weight rotation.
  • The concept is similar to QuaRot but the rotation weights are trained on a calibration dataset to improve activation quantization quality.
  • Only per_token and per_tensor activation quantization are supported. Groupwise quatization is not supported yet since we don't expect it to be used in subsequent QDQ models.
  • Common training utils have been abstracted from the lora passes.

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
  • Is this PR including examples changes? If yes, please remember to update example documentation in a follow-up PR.

(Optional) Issue link

@jambayk jambayk marked this pull request as draft January 17, 2025 00:31
@jambayk jambayk force-pushed the jambayk/spinquant branch 2 times, most recently from 28222ad to 7d4169d Compare January 22, 2025 22:56
Base automatically changed from jambayk/quarot to main January 22, 2025 23:30
@jambayk jambayk marked this pull request as ready for review January 22, 2025 23:33

# optimizer
optimizer = SGDG(
rotation_params, lr=training_args.learning_rate, weight_decay=training_args.weight_decay, stiefel=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stiefel always true here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is required to be True to do the cayley sgd for orthagonal matrices. without it, it behaves the same as normal sgd. original implementation here https://github.com/facebookresearch/SpinQuant/blob/44dbc26056ee9e319dd8ce24bfbf7203785f5c77/optimize_rotation.py#L109

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why the stiefel default is False, and it also implements the case when stiefel is False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the reason but yes, it implements the False case too.

I got it from https://github.com/facebookresearch/SpinQuant/blob/main/train_utils/optimizer.py#L57 which was in based on the torch sgd implementation https://github.com/pytorch/pytorch/blob/main/torch/optim/sgd.py#L26.
Maybe they wanted to have both modes be available in the same optimizer for completeness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, if we don't (and won't) support the other case, we should remove those dead codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! I will remove the False case and simplify the optimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing this, but the options are more tied than I expected. There are if/else conditions that involve stiefel=true/else + related parameters. I cannot verify the correctness of a modified optimizer so decided to keep it as is. Even the original implementation of spinquant copied the optimizer directly from the source code of the paper that introduced the algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants