[MLS-272] Fix Special Token Encode Difference #201
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently in MLC Serve, we may encounter slightly different input tokens compare to expected due to the handling of special tokens. Take the recent codellama 70b model as an example, in the official document, a chat example is given as follows:
After tokenizer's template application
tokenizer.apply_chat_template(chat, tokenize=False)
it becomesAnd the reference of generated token is:
However, in our implementation, even if the generated prompt is the same
The generated token has an extra token
1
in the beginningThe difference comes from the different usage of the
add_special_tokens
option when doing tokenizer encoding. Thetransformers
lib'sapply_chat_template
function usesadd_special_tokens=False
as default when doing tokenization, code can be found here. This PR uses the same option default and can get the same generated token as official reference code.Also it remains a question whether we should adopt
apply_chat_template
function as a potential way to do chat template application and tokenization using tokenizer directly.CC @sunggg