New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[FlashInfer] Upgrade to 0.2.0 #11194

Open

abmfy wants to merge 37 commits into vllm-project:main from abmfy:flashinfer-0.2

+207 −36

abmfy commented Dec 14, 2024

This PR upgrades the FlashInfer attention backend to v0.2.0.

github-actions bot commented Dec 14, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀


          [misc] remove deprecated call to end_forward in flashinfer backend

269f965

Signed-off-by: Bowen Wang <[email protected]>

abmfy force-pushed the flashinfer-0.2 branch from b739c11 to 269f965 Compare

December 14, 2024 06:03

Author

abmfy commented Dec 14, 2024

Member

youkaichao commented Dec 14, 2024

Looking forward to the update!

Swipe4057 commented Dec 18, 2024

https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.2.0

pavanimajety mentioned this pull request

[Core] Changes to support 0.2.0 flashinfer #11314

Open


          [flashinfer] upgrade to flashinfer 0.2.0

8c375a3

Signed-off-by: Bowen Wang <[email protected]>

DarkLight1337 mentioned this pull request

[Feature]: Will vLLM support flash-attention 3 ? #11372

Closed

1 task

abmfy added 5 commits

December 20, 2024 14:39


          [style] fix yapf check

a62b854

Signed-off-by: Bowen Wang <[email protected]>


          [FlashInfer] Pass infered global hyperparameters to plan

b37ff55

Signed-off-by: Bowen Wang <[email protected]>


          [FlashInfer] Cache inferred global hyperparameters

72bdf7e

Signed-off-by: Bowen Wang <[email protected]>


          [Misc] Use typing.Optional for Python 3.9 compatability

97dcedc

Signed-off-by: Bowen Wang <[email protected]>


          [Style] Fix lint errors

56798c5

Signed-off-by: Bowen Wang <[email protected]>

youkaichao mentioned this pull request

[core] separate builder init and builder prepare for each batch #12253

Merged

abmfy added 3 commits

January 22, 2025 11:42


          Merge branch 'main' into flashinfer-0.2

706a6f6

Signed-off-by: Bowen Wang <[email protected]>


          [FlashInfer] Cache global hyperparameters in AttentionMetadataBuilder…

dacb6af

… instance

Signed-off-by: Bowen Wang <[email protected]>


          [Style] Fix ruff

06fa7cc

Signed-off-by: Bowen Wang <[email protected]>

youkaichao reviewed

View reviewed changes

vllm/attention/backends/flashinfer.py Outdated Show resolved Hide resolved

youkaichao reviewed

View reviewed changes

vllm/attention/backends/flashinfer.py Outdated

    
                  sm_scale: float

              def infer_global_hyperparameters(model: nn.Module) -> GlobalHyperparameters:

Member

youkaichao Jan 23, 2025

this function can collect all per_layer_parameter, and only assert the results are the same.

youkaichao reviewed

View reviewed changes

vllm/attention/backends/flashinfer.py

    
            @@ -495,6 +597,8 @@ def __init__(self, input_builder: "ModelInputForGPUBuilder"):
          
                      self.sliding_window = input_builder.sliding_window

                      self.block_size = input_builder.block_size

Member

youkaichao Jan 23, 2025

you can remember the vllm_config here by calling get_current_vllm_config()

youkaichao reviewed

View reviewed changes

vllm/attention/backends/flashinfer.py Outdated

    
                          # - `window_left`

                          # - `logits_soft_cap`

                          # - `sm_scale`

                          model = self.runner.model

Member

youkaichao Jan 23, 2025

vllm_config.compilation_config.static_forward_context is a dict of layer prefix to attention layer. you can collect sliding window, etc. from there. no need to iterate over model's submodule.


          [FlashInfer] Get per layer params from vllm config

bc480b0

Signed-off-by: Bowen Wang <[email protected]>

youkaichao reviewed

View reviewed changes

vllm/attention/backends/flashinfer.py

    
            @@ -178,6 +179,9 @@ def __init__(self, runner):
          
                      self._decode_wrapper = None

                      self._prefill_wrapper = None

                      # Global hyperparameters shared by all attention layers

                      self.global_hyperparameters: Optional[PerLayerParameters] = None

Member

youkaichao Jan 23, 2025

remember the vllm_config here?

youkaichao reviewed

View reviewed changes

vllm/worker/model_runner.py Outdated

Comment on lines 1501 to 1509

+                                  with set_current_vllm_config(self.vllm_config):
+                                      # To make vLLM config available during
+                                      # worker initialization
+                                      attn_metadata = (self.attn_state.
+                                                       graph_capture_get_metadata_for_batch(
+                                                           batch_size,
+                                                           is_encoder_decoder_model=self.
+                                                           model_config.is_encoder_decoder,
+                                                       ))

Member

youkaichao Jan 23, 2025

then we don't need this change.

Member

youkaichao commented Jan 23, 2025

also need to update this line to pass the ci:

vllm/Dockerfile

Line 200 in f0ef372

    
           python3 -m pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.1.6/flashinfer-0.1.6+cu121torch2.4-cp${PYTHON_VERSION_STR}-cp${PYTHON_VERSION_STR}-linux_x86_64.whl; \

mergify bot added the ci/build label

youkaichao approved these changes

View reviewed changes

Member

youkaichao left a comment

LGTM, thanks for the contribution!

youkaichao marked this pull request as ready for review

January 23, 2025 13:55

youkaichao requested a review from zhuohan123 as a code owner

January 23, 2025 13:55


          Merge branch 'main' into flashinfer-0.2

500ff5b

Signed-off-by: Bowen Wang <[email protected]>

abmfy force-pushed the flashinfer-0.2 branch from d21d3a7 to 500ff5b Compare

January 24, 2025 09:41

mergify bot removed the needs-rebase label

abmfy and others added 7 commits

January 24, 2025 12:22


          [Misc] Add space in assert message

bde6807

Signed-off-by: Bowen Wang <[email protected]>


          [FlashInfer] Warn on models with interleaved attention

69d7c8d

Signed-off-by: Bowen Wang <[email protected]>


          [Test] Change backend to flash_attn for gemma in compile tests

d4d63dc

Signed-off-by: Bowen Wang <[email protected]>


          fix inconsistent vllm config

6e7e933

Signed-off-by: youkaichao <[email protected]>


          Merge branch 'flashinfer-0.2' of github.com:abmfy/vllm-flashinfer int…

0b47067

…o flashinfer-0.2

Signed-off-by: Bowen Wang <[email protected]>


          [Test] Skip tests for Gemma2 with FlashInfer backend

f6e33a7

Signed-off-by: Bowen Wang <[email protected]>


          [CI] Build FlashInfer from source

847a4d6

Signed-off-by: Bowen Wang <[email protected]>

abmfy force-pushed the flashinfer-0.2 branch from eaf71ae to 847a4d6 Compare

January 25, 2025 09:21

abmfy and others added 9 commits

January 25, 2025 11:01


          [CI] Fix FlashInfer build command

5b0fe64

Signed-off-by: Bowen Wang <[email protected]>


          [CI] Fix Dockerfile

69445cd

Signed-off-by: Bowen Wang <[email protected]>


          [CI] Fix FlashInfer AOT build in Dockerfile

963aff7

Signed-off-by: Bowen Wang <[email protected]>


          fix flashinfer docker build

ae9da66

Signed-off-by: youkaichao <[email protected]>


          Merge branch 'main' into flashinfer-0.2

afa377c


          fix build command

269e1eb

Signed-off-by: youkaichao <[email protected]>


          move command

2e50ab8

Signed-off-by: youkaichao <[email protected]>


          unify to use setup.py

0fe979d

Signed-off-by: youkaichao <[email protected]>


          fix cd

3dd209c

Signed-off-by: youkaichao <[email protected]>

tlrmchlsmth reviewed

View reviewed changes

Dockerfile Outdated

Comment on lines 216 to 220

+              # RUN --mount=type=cache,target=/root/.cache/pip \
+              # . /etc/environment && \
+              # if [ "$TARGETPLATFORM" != "linux/arm64" ]; then \
+              #     python3 -m pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.0.post1/flashinfer-0.2.0.post1+cu121torch2.4-cp${PYTHON_VERSION_STR}-cp${PYTHON_VERSION_STR}-linux_x86_64.whl; \
+              # fi

Collaborator

tlrmchlsmth Jan 26, 2025

delete prior to landing?

Member

youkaichao Jan 26, 2025

commented in bb44221 . we might need it soon, so I just comment it out right now.

tlrmchlsmth approved these changes

View reviewed changes

youkaichao and others added 6 commits

January 26, 2025 10:10


          fix recursive clone

bcd04fd

Signed-off-by: youkaichao <[email protected]>


          comment

bb44221

Signed-off-by: youkaichao <[email protected]>


          [CI] Use precompiled FlashInfer AOT wheel

5ca67ae

Signed-off-by: Bowen Wang <[email protected]>


          [CI] Temporarily switch to CUDA develop image for vllm-base

3c89bfb

Signed-off-by: Bowen Wang <[email protected]>


          Merge branch 'main' into flashinfer-0.2

293fdd6

Signed-off-by: Bowen Wang <[email protected]>


          also install jit build dependency

5d8ad22

Signed-off-by: youkaichao <[email protected]>

youkaichao enabled auto-merge (squash)

January 26, 2025 13:28

github-actions bot added the ready label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

youkaichao youkaichao approved these changes

tlrmchlsmth tlrmchlsmth approved these changes

zhuohan123 Awaiting requested review from zhuohan123 zhuohan123 is a code owner

alexm-redhat Awaiting requested review from alexm-redhat alexm-redhat is a code owner

comaniac Awaiting requested review from comaniac comaniac is a code owner

njhill Awaiting requested review from njhill njhill is a code owner

mgoin Awaiting requested review from mgoin

robertgshaw2-redhat Awaiting requested review from robertgshaw2-redhat

WoosukKwon Awaiting requested review from WoosukKwon

LiuXiaoxuanPKU Awaiting requested review from LiuXiaoxuanPKU

DarkLight1337 Awaiting requested review from DarkLight1337

ywang96 Awaiting requested review from ywang96

simon-mo Awaiting requested review from simon-mo

Labels

ci/build documentation frontend ready