Fix calculation of remaining number of cache slots (prompt tokens not accounted) #126

masahi · 2023-12-19T21:23:44Z

This happens when we run out of cache slots so we need to evict some requests. But we are not accounting for the prompt token counts in calculating the number of remaining free blocks, so we fail to detect the need for eviction when we need to.

masahi · 2023-12-19T21:57:59Z

serve/mlc_serve/model/paged_cache_manager.py

+        for seq_id, tokens in self.allocated_decode_tokens.items():
+            prompt_seq_id = get_prompt_sequence_id(seq_id.request_id)
+            prompt_tokens = self.allocated_prompt_tokens[prompt_seq_id]
+            total_tokens.append(prompt_tokens + tokens)


This code runs on a hot path and this naive loop incurs noticeable perf regression (5.89 -> 5.78 req / sec) for 13B. However, I suggest merging this as is and follow-up with a non-regressing solution later.

sunggg

Confirmed that this resolves the hang reported by @elvin-n. Thank you @masahi for the hot fix and @elvin-n for spotting the danger!

masahi added 2 commits December 19, 2023 20:40

add assert on the free block length

f2c6b0d

Fix unaccounted prompt token counts in remaining cache space calc

629cb56

masahi commented Dec 19, 2023

View reviewed changes

sunggg approved these changes Dec 19, 2023

View reviewed changes

sunggg merged commit f32375a into octoml:batch-serving Dec 19, 2023
1 check passed

masahi mentioned this pull request Dec 20, 2023

Simplify allocated token counts management #127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix calculation of remaining number of cache slots (prompt tokens not accounted) #126

Fix calculation of remaining number of cache slots (prompt tokens not accounted) #126

masahi commented Dec 19, 2023 •

edited

Loading

masahi Dec 19, 2023

sunggg left a comment

Fix calculation of remaining number of cache slots (prompt tokens not accounted) #126

Fix calculation of remaining number of cache slots (prompt tokens not accounted) #126

Conversation

masahi commented Dec 19, 2023 • edited Loading

masahi Dec 19, 2023

Choose a reason for hiding this comment

sunggg left a comment

Choose a reason for hiding this comment

masahi commented Dec 19, 2023 •

edited

Loading