Switch dynamic FP8 grouped gemm to accept tensor inputs #3552

jwfromm · 2025-01-06T23:51:41Z

Summary:
As we continue the long march towards optimal MOE performance, we've identifed that in prefill, having to artificially split inputs and then check that each group is valid introduces non-trivial overhead. Since all the inputs must be in consecutive memory anyway, it's better to just require them to be contiguous tensors rather than TensorLists.

While making this change may sound simple, it required switching the kernels to a templated implementation. This is the most elegant way to support various input and output types for shared kernels, despite it being a large refactor.

I also removed some of the now outdated fbgemm profiling scripts. They likely arent useful going forward anyway.

Differential Revision: D67881909

Summary: As we continue the long march towards optimal MOE performance, we've identifed that in prefill, having to artificially split inputs and then check that each group is valid introduces non-trivial overhead. Since all the inputs must be in consecutive memory anyway, it's better to just require them to be contiguous tensors rather than TensorLists. While making this change may sound simple, it required switching the kernels to a templated implementation. This is the most elegant way to support various input and output types for shared kernels, despite it being a large refactor. I also removed some of the now outdated fbgemm profiling scripts. They likely arent useful going forward anyway. Differential Revision: D67881909

netlify · 2025-01-06T23:52:03Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`cfced91`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/677c6c901175b400089d8413
😎 Deploy Preview	https://deploy-preview-3552--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2025-01-06T23:52:17Z

This pull request was exported from Phabricator. Differential Revision: D67881909

facebook-github-bot added the cla signed label Jan 6, 2025

facebook-github-bot added the fb-exported label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch dynamic FP8 grouped gemm to accept tensor inputs #3552

Switch dynamic FP8 grouped gemm to accept tensor inputs #3552

jwfromm commented Jan 6, 2025

netlify bot commented Jan 6, 2025 •

edited

Loading

facebook-github-bot commented Jan 6, 2025

Switch dynamic FP8 grouped gemm to accept tensor inputs #3552

Are you sure you want to change the base?

Switch dynamic FP8 grouped gemm to accept tensor inputs #3552

Conversation

jwfromm commented Jan 6, 2025

netlify bot commented Jan 6, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Jan 6, 2025

netlify bot commented Jan 6, 2025 •

edited

Loading