Fix hanging on prompt counts > max model context len #74

masahi · 2023-11-21T02:01:14Z

No description provided.

masahi · 2023-11-21T02:01:34Z

serve/mlc_serve/engine/async_connector.py

@@ -85,7 +85,7 @@ async def generate(self, request: Request) -> AsyncIterator[RequestOutput]:
                if output.is_finished:
                    return
        except asyncio.CancelledError:
-            asyncio.to_thread(self.engine.cancel, request.request_id)
+            await asyncio.to_thread(self.engine.cancel, request.request_id)


@jroesch The bug fix is included here

masahi · 2023-11-21T02:02:12Z

serve/mlc_serve/engine/staging_engine_worker.py

@@ -58,7 +58,7 @@ def __init__(
        self.cache_manager = model_module.cache_manager
        self.tokenizer = model_module.tokenizer
        self.model_artifact_config = model_module.model_artifact_config
-
+        self.max_context_length = self.model_artifact_config.max_context_length


@sunggg I'm assuming that max_context_length is always available in the artifact. Let me know if it is not the case.

Correct. It is safe to assume so at least for the models of our current interests.

* fix cancelled request not awaited * fix working * compare against max_context_len --------- Co-authored-by: Masahiro Masuda <[email protected]>

Masahiro Masuda added 3 commits November 20, 2023 23:57

fix cancelled request not awaited

eee69aa

fix working

894dcd7

compare against max_context_len

ddf844a

masahi commented Nov 21, 2023

View reviewed changes

masahi merged commit e06cb16 into octoml:batch-serving Nov 21, 2023
5 checks passed

masahi added a commit that referenced this pull request Nov 21, 2023

Fix hanging on prompt counts > max model context len (#74)

21a9211

* fix cancelled request not awaited * fix working * compare against max_context_len --------- Co-authored-by: Masahiro Masuda <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hanging on prompt counts > max model context len #74

Fix hanging on prompt counts > max model context len #74

masahi commented Nov 21, 2023

masahi Nov 21, 2023

masahi Nov 21, 2023

sunggg Nov 21, 2023

Fix hanging on prompt counts > max model context len #74

Fix hanging on prompt counts > max model context len #74

Conversation

masahi commented Nov 21, 2023

masahi Nov 21, 2023

Choose a reason for hiding this comment

masahi Nov 21, 2023

Choose a reason for hiding this comment

sunggg Nov 21, 2023

Choose a reason for hiding this comment