[Bugfix] Validate lora adapters to avoid crashing server #11727

joerunde · 2025-01-03T22:24:19Z

This PR addresses an issue where loading an invalid lora adapter will crash the engine and shut the server down. This happens because dynamically loaded adapters are first loaded just-in-time, during model execution. Invalid adapters will raise at load time, and any exceptions raised during model execution will shut down the engine and the server.

Instead, this PR updates the /v1/load_lora_adapter handler to first call engine.add_lora to ensure that the adapter can be successfully loaded. On error, the adapter is not added to the list of available models to use. This should result in a better user experience, as users will immediately know if there is a problem with their adapter when they try to load it, instead of having it crash their server later when they try to use it.

FIX #11702

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2025-01-03T22:24:33Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

jeejeelee · 2025-01-04T06:56:34Z

Very useful feature! I will review this PR asap . Thanks!

vllm/lora/worker_manager.py

vllm/engine/multiprocessing/client.py

vllm/entrypoints/openai/serving_models.py

Signed-off-by: Joe Runde <[email protected]>

vllm/entrypoints/openai/serving_models.py

Signed-off-by: Joe Runde <[email protected]>

vllm/lora/worker_manager.py

Signed-off-by: Joe Runde <[email protected]>

jeejeelee

Thank you very much for your contribution, overall LGTM

…-loading

joerunde added 3 commits January 2, 2025 16:30

🚧 WIP validate dynamic lora adapters

70fc214

Signed-off-by: Joe Runde <[email protected]>

♻️ Clean up mp engine integration

a8745c0

Signed-off-by: Joe Runde <[email protected]>

🐛 Implement add_lora in old AsyncLLMEngine

f6c940d

Signed-off-by: Joe Runde <[email protected]>

joerunde requested review from DarkLight1337, robertgshaw2-neuralmagic and simon-mo as code owners January 3, 2025 22:24

mergify bot added the frontend label Jan 3, 2025

jeejeelee self-requested a review January 4, 2025 06:55

haitwang-cloud reviewed Jan 6, 2025

View reviewed changes

vllm/lora/worker_manager.py Show resolved Hide resolved

varun-sundar-rabindranath reviewed Jan 6, 2025

View reviewed changes

vllm/engine/multiprocessing/client.py Outdated Show resolved Hide resolved

varun-sundar-rabindranath reviewed Jan 6, 2025

View reviewed changes

vllm/entrypoints/openai/serving_models.py Show resolved Hide resolved

joerunde added 3 commits January 6, 2025 15:30

🐛 add add_lora in v1.AsyncLLM

f207845

Signed-off-by: Joe Runde <[email protected]>

🔊 add logs on adapter load/unload

c0354c8

Signed-off-by: Joe Runde <[email protected]>

♻️ simplify output checks in mp client

f33158e

Signed-off-by: Joe Runde <[email protected]>

joerunde requested review from WoosukKwon, njhill, ywang96, comaniac and alexm-neuralmagic as code owners January 6, 2025 23:07

jeejeelee reviewed Jan 8, 2025

View reviewed changes

vllm/entrypoints/openai/serving_models.py Show resolved Hide resolved

joerunde added 5 commits January 8, 2025 13:03

♻️ load new adapter before evicting LRU

bae8f8f

Signed-off-by: Joe Runde <[email protected]>

🐛 fix lora id counter

f0e238e

Signed-off-by: Joe Runde <[email protected]>

🧪 add lora robustness test

7d4f033

Signed-off-by: Joe Runde <[email protected]>

🐛 fixup LRU eviction

9335d4d

Signed-off-by: Joe Runde <[email protected]>

🧪 stress test dynamic loras

e307210

Signed-off-by: Joe Runde <[email protected]>

joerunde force-pushed the lora-loading branch from cf12e6a to e307210 Compare January 8, 2025 22:50

🐛 crash on invalid static lora adapters

cf95295

Signed-off-by: Joe Runde <[email protected]>

jeejeelee reviewed Jan 9, 2025

View reviewed changes

vllm/lora/worker_manager.py Outdated Show resolved Hide resolved

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 9, 2025

♻️ re-order lora lru cache logic

a9bdcdf

Signed-off-by: Joe Runde <[email protected]>

shahedy2276541 approved these changes Jan 9, 2025

View reviewed changes

jeejeelee approved these changes Jan 10, 2025

View reviewed changes

Merge branch 'main' of https://github.com/vllm-project/vllm into lora…

711ea01

…-loading

DarkLight1337 enabled auto-merge (squash) January 10, 2025 03:21

youkaichao disabled auto-merge January 10, 2025 07:56

youkaichao merged commit ac2f3f7 into vllm-project:main Jan 10, 2025
59 of 61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Validate lora adapters to avoid crashing server #11727

[Bugfix] Validate lora adapters to avoid crashing server #11727

joerunde commented Jan 3, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 3, 2025

jeejeelee commented Jan 4, 2025

jeejeelee left a comment

[Bugfix] Validate lora adapters to avoid crashing server #11727

[Bugfix] Validate lora adapters to avoid crashing server #11727

Conversation

joerunde commented Jan 3, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 3, 2025

jeejeelee commented Jan 4, 2025

jeejeelee left a comment

Choose a reason for hiding this comment

joerunde commented Jan 3, 2025 •

edited by github-actions bot

Loading