[Question]: Token indices sequence length is longer than the specified maximum sequence length for this model (614 > 512). Running this sequence through the model will result in indexing errors #165

lifengyu2005 · 2024-06-19T02:08:42Z

Describe the issue

I use the following configuration, why is it throwing an error? I see a lot of 512 configurations in the llmlingua installation path. Do I need to retrain the model, or is it an issue with the llmlingua version?

self.model_compress = PromptCompressor(
model_name="/xxx/llmlingua/llmlingua-2-xlm-roberta-large-meetingbank",
use_llmlingua2=True, # Whether to use llmlingua-2
llmlingua2_config={
"max_batch_size": 100,
"max_force_token": 4096,
},
)

llmlingua ver 0.2.2

iofu728 · 2024-06-20T08:59:10Z

Hi @lifengyu2005, thanks for your support. These logs appear to be warnings. Did your program crash because of these warnings? Please provide more details to help us identify the issue.

cornzz · 2024-09-11T00:22:13Z

@lifengyu2005 This warning comes from the tokenizer, not the model itself, you can reproduce this as shown below.
The model used in LLMLingua-2 can only handle input lengths of up to 512 tokens, the compressor divides the prompt into 512 token length chunks and compresses those chunks separately. So even when you see this warning, everything is working as intended.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/llmlingua-2-xlm-roberta-large-meetingbank")
tokens = tokenizer.encode("Loooong prompt...")
# Output: Token indices sequence length is longer than the specified maximum sequence length for this model (1500 > 512). Running this sequence through the model will result in indexing errors

lifengyu2005 added the question Further information is requested label Jun 19, 2024

iofu728 assigned pzs19 Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Token indices sequence length is longer than the specified maximum sequence length for this model (614 > 512). Running this sequence through the model will result in indexing errors #165

[Question]: Token indices sequence length is longer than the specified maximum sequence length for this model (614 > 512). Running this sequence through the model will result in indexing errors #165

lifengyu2005 commented Jun 19, 2024

iofu728 commented Jun 20, 2024

cornzz commented Sep 11, 2024 •

edited

Loading

[Question]: Token indices sequence length is longer than the specified maximum sequence length for this model (614 > 512). Running this sequence through the model will result in indexing errors #165

[Question]: Token indices sequence length is longer than the specified maximum sequence length for this model (614 > 512). Running this sequence through the model will result in indexing errors #165

Comments

lifengyu2005 commented Jun 19, 2024

Describe the issue

iofu728 commented Jun 20, 2024

cornzz commented Sep 11, 2024 • edited Loading

cornzz commented Sep 11, 2024 •

edited

Loading