Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLS-272] Fix Special Token Encode Difference #201

Conversation

zxybazh
Copy link

@zxybazh zxybazh commented Feb 11, 2024

Currently in MLC Serve, we may encounter slightly different input tokens compare to expected due to the handling of special tokens. Take the recent codellama 70b model as an example, in the official document, a chat example is given as follows:

chat = [
    {"role": "system", "content": "System prompt    "},
    {"role": "user", "content": "First user query"},
    {"role": "assistant", "content": "Model response to first query"},
    {"role": "user", "content": "Second user query"},
]

After tokenizer's template application tokenizer.apply_chat_template(chat, tokenize=False) it becomes

'<s>Source: system\n\n System prompt <step> Source: user\n\n First user query <step> Source: assistant\n\n Model response to first query <step> Source: user\n\n Second user query <step> Source: assistant\nDestination: user\n\n '

And the reference of generated token is:

reference_tokens = [1, 7562, 29901, 1788, 13, 13, 2184, 9508, 32015, 7562, 29901, 1404, 13, 13, 3824, 1404, 2346, 32015, 7562, 29901, 20255, 13, 13, 8125, 2933, 304, 937, 2346, 32015, 7562, 29901, 1404, 13, 13, 6440, 1404, 2346, 32015, 7562, 29901, 20255, 13, 14994, 3381, 29901, 1404, 13, 13, 29871]

However, in our implementation, even if the generated prompt is the same

'<s>Source: system\n\n System prompt <step> Source: user\n\n First user query <step> Source: assistant\n\n Model response to first query <step> Source: user\n\n Second user query <step> Source: assistant\nDestination: user\n\n '

The generated token has an extra token 1 in the beginning

[1, 1, 7562, 29901, 1788, 13, 13, 2184, 9508, 32015, 7562, 29901, 1404, 13, 13, 3824, 1404, 2346, 32015, 7562, 29901, 20255, 13, 13, 8125, 2933, 304, 937, 2346, 32015, 7562, 29901, 1404, 13, 13, 6440, 1404, 2346, 32015, 7562, 29901, 20255, 13, 14994, 3381, 29901, 1404, 13, 13, 29871]

The difference comes from the different usage of the add_special_tokens option when doing tokenizer encoding. The transformers lib's apply_chat_template function uses add_special_tokens=False as default when doing tokenization, code can be found here. This PR uses the same option default and can get the same generated token as official reference code.

Also it remains a question whether we should adopt apply_chat_template function as a potential way to do chat template application and tokenization using tokenizer directly.

CC @sunggg

@zxybazh zxybazh changed the title [WIP] Fix Special Token Encode Difference [MLS-272] Fix Special Token Encode Difference Feb 12, 2024
@zxybazh zxybazh marked this pull request as ready for review February 12, 2024 08:21
Copy link
Member

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you, @zxybazh! One minor comment for readability.

serve/mlc_serve/engine/model_module.py Outdated Show resolved Hide resolved
@zxybazh
Copy link
Author

zxybazh commented Feb 13, 2024

Comments addressed, please take another look @sunggg

Copy link
Member

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick reflection, @zxybazh!

@sunggg sunggg merged commit 5b57390 into octoml:batch-serving Feb 13, 2024
1 check passed
sunggg added a commit that referenced this pull request Feb 13, 2024
sunggg added a commit that referenced this pull request Feb 13, 2024
Revert "[MLS-272] Fix Special Token Encode Difference (#201)"

This reverts commit 5b57390.
Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants