-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FA3 KV Cache is slower than FA2 KV Cache #1465
Comments
Can you post a short script to benchmark the two? |
Hi, thanks for quick response! Here is the code (Actually I don't know how to install FA2 and FA3 in the same conda env):
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
GPU I use is NVIDIA H100 80GB HBM3.
I try to benchmark the performance of flash-decoding.
And the figure I show is under the parameter by:
And I've checked other settings, and cannot reproduce PR#1236
The text was updated successfully, but these errors were encountered: