[Param] Recheck and update repetition penalty parameter #202

vvchernov · 2024-02-12T07:43:38Z

Recheck that HF TGI API and vLLM approach for repetition penalty is the same
Correct calculation of repetition penalty, add prompt tokens for it
Clean code: rename sampling_metadata -> sampling_state anywhere

vvchernov · 2024-02-12T17:31:55Z

cc @sunggg

sunggg

Thank you for the quick action, @vvchernov! A couple questions.

serve/mlc_serve/engine/engine_common.py

sunggg · 2024-02-12T18:42:32Z

serve/mlc_serve/model/sampler.py

@@ -372,20 +382,39 @@ def from_sampling_params(
        )


-def adjust_logits(logits, sampling_metadata, vocab_size):
+def get_bin_counts_and_mask(


Same comment with above. Would it make sense to move this to SamplingState prep and perform this on CPU? If the perf impact is not bad, it seems better to prepare there as we are looking for an option to make the prep process more async.

Valery talked about this in the issue, repasting some of it here:

In my point of view the good place is SamplingState, but it calculates each time for new request. It looks like it is not good thing due to in general it can be done once, but redo this requires much time. May be SamplingParams in Request should be replaced by SamplingState with some sharing for parameter n > 1.
What do you think about it?
Just now I plan to save it in SamplingParams for quick fix, but we need a better place

Ah, thanks @binarybana and this is great point. @vvchernov, can we follow-up about this? We can separate what needs/does not need to be updated each iteration like you raised.

Oh, thanks @binarybana to share my ideas! I repeat it again in other words above.

can we follow-up about this? We can separate what needs/does not need to be updated each iteration like you raised.

First of all it looks like a separate task, if we need to support correct repetition penalty in high priority it is better to do it first. I see the task is redesign of API, it should be done carefully and we need more discussions in details. I can prepare separate draft of PR with some dirty code which implements my idea and we will discuss there how it should be. And I'm still aware about logits_processor. Now it looks like it was done for json mode, but in general view it does the same as sampler. It is better also rethink this moment in our redesign

serve/mlc_serve/model/sampler.py

binarybana · 2024-02-12T20:06:04Z

@vvchernov , thanks for the quick action on this one. Can you also see what the performance impact is with and without this enabled? Ideally for sequences of ~2k input, 50 tokens under fairly loaded (5-15 VUs) output to match customers.

sunggg · 2024-02-12T20:07:33Z

serve/tests/unittest/test_sampler.py


    expected = torch.clone(logits)
    for batch_size in range(batch_size):
        logit_bias = list_logit_bias[batch_size]
        for idx, val in logit_bias.items():
            expected[batch_size][idx - 1] += val
-    new_logits = adjust_logits(logits, sampling_metadata, vocab_size)
+    new_logits = adjust_logits(logits, sampling_state, vocab_size)
    assert torch.allclose(expected, new_logits)


 def _test_penalties_checker():


Can we also add the testcase?

My colleague extends tests for sampling parameters, particularly for repetition_penalty. See #200

It is okay to be very basic, so please add one. We've not been doing this so far but I'd like to recommend every PR to have unittests that validates the very basic functionality at least. With SLM migration, we will install the CI as well.

elvin-n · 2024-02-14T09:54:09Z

serve/tests/unittest/test_sampler.py

            sampling_params,
            list_past_output_tokens=past_output_tokens,
            dtype=dtype,
            dev=dev,
            vocab_size=vocab_size,
        )
    torch.cuda.current_stream().wait_stream(_copy_stream)
-    return sampling_metadata
+    return sampling_state


 def _test_temperature(temp=0, batch_size=1):


@sunggg why do we have _ symbol before test? I added such symbol to functions which cannot be run using pytests, but functions from test_sampler.py does not require any compiled model and can be called under pytest

If there is no special reason, I propose to remove underscore symbol

We can ask Iliya to do it in his PR (#200) due to he extends these tests now.

No specific reason, I just followed the other testcase. I'm okay to remove this for pytest.

sunggg · 2024-02-14T20:19:14Z

Based on the offline discussion, we decided to merge this for the sense of urgency. @vvchernov, let's add the test case in #200.

vvchernov marked this pull request as draft February 12, 2024 07:43

vvchernov marked this pull request as ready for review February 12, 2024 17:31

sunggg reviewed Feb 12, 2024

View reviewed changes

Valery Chernov added 5 commits February 14, 2024 13:21

sampling_metadata -> sampling_state

e53ba4f

update repetition penalties

abe3d18

correct repetition penalties calculation use vLLM and HF approaches

3197fa7

construct mask_prompt and transfer to repetition penalty

65ae93a

fix comments

f439b97

vvchernov force-pushed the vc/repetition_penalty branch from 4c554a0 to f439b97 Compare February 14, 2024 09:21

elvin-n reviewed Feb 14, 2024

View reviewed changes

Valery Chernov added 4 commits February 14, 2024 18:53

fix penalty calculation

fe8b5de

add mask_prompt in correct place

fa161b8

fix dim for scatter

77180aa

fix device

52155ca

binarybana merged commit 2ebbacf into octoml:batch-serving Feb 14, 2024
1 check passed

vvchernov deleted the vc/repetition_penalty branch February 16, 2024 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Param] Recheck and update repetition penalty parameter #202

[Param] Recheck and update repetition penalty parameter #202

vvchernov commented Feb 12, 2024 •

edited

Loading

vvchernov commented Feb 12, 2024

sunggg left a comment

sunggg Feb 12, 2024

binarybana Feb 12, 2024

sunggg Feb 12, 2024 •

edited

Loading

vvchernov Feb 13, 2024

binarybana commented Feb 12, 2024

sunggg Feb 12, 2024

vvchernov Feb 14, 2024

sunggg Feb 14, 2024

elvin-n Feb 14, 2024

vvchernov Feb 14, 2024

sunggg Feb 14, 2024

sunggg commented Feb 14, 2024

[Param] Recheck and update repetition penalty parameter #202

[Param] Recheck and update repetition penalty parameter #202

Conversation

vvchernov commented Feb 12, 2024 • edited Loading

vvchernov commented Feb 12, 2024

sunggg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunggg Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binarybana commented Feb 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunggg commented Feb 14, 2024

vvchernov commented Feb 12, 2024 •

edited

Loading

sunggg Feb 12, 2024 •

edited

Loading