Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch dynamic FP8 grouped gemm to accept tensor inputs #3552

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jwfromm
Copy link
Contributor

@jwfromm jwfromm commented Jan 6, 2025

Summary:
As we continue the long march towards optimal MOE performance, we've identifed that in prefill, having to artificially split inputs and then check that each group is valid introduces non-trivial overhead. Since all the inputs must be in consecutive memory anyway, it's better to just require them to be contiguous tensors rather than TensorLists.

While making this change may sound simple, it required switching the kernels to a templated implementation. This is the most elegant way to support various input and output types for shared kernels, despite it being a large refactor.

I also removed some of the now outdated fbgemm profiling scripts. They likely arent useful going forward anyway.

Differential Revision: D67881909

Summary:
As we continue the long march towards optimal MOE performance, we've identifed that in prefill, having to artificially split inputs and then check that each group is valid introduces non-trivial overhead. Since all the inputs must be in consecutive memory anyway, it's better to just require them to be contiguous tensors rather than TensorLists.

While making this change may sound simple, it required switching the kernels to a templated implementation. This is the most elegant way to support various input and output types for shared kernels, despite it being a large refactor.

I also removed some of the now outdated fbgemm profiling scripts. They likely arent useful going forward anyway.

Differential Revision: D67881909
Copy link

netlify bot commented Jan 6, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit cfced91
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/677c6c901175b400089d8413
😎 Deploy Preview https://deploy-preview-3552--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67881909

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants