You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was evaluating how well the (Long)LLMLingua is able to achieve the requested compression rate (focusing on the rate parameter, not target_tokens) and came to these conclusions:
For smaller prompts (< 150 tokens) barely any compression can be achieved, if any at all
Requested compression rate is best achieved for prompts around 2000 tokens
For longer prompts (>5000 tokens) the requested rate is overshot (or undershot)
More detailed results are below.
My question is, am doing something wrong when invoking LLMLingua, or is this behaviour normal?
I adhered to the usage examples in README.md:
I tested with the default Llama 2 7b as well as with GPT-2. It seems that with the smaller model the deviation overall is smaller than with the bigger model.
(Prompt lengths measured using the GPT-3.5 tokenizer)
LLMLingua with Llama 2
LLMLingua with GPT-2
LongLLMLingua with Llama 2
LongLLMLingua with GPT-2
In contrast, LLMLingua-2 adheres to the requested compression rate quite well, only slightly overshooting the requested rate:
LLMLingua-2
The prompts I used are truncated from the longest prompt in the LongBench GovReport task (link).
The text was updated successfully, but these errors were encountered:
I suppose I answered the question myself while investigating further:
Small prompts not being compressed was a bug, for bigger prompts I guess the rate is overshot because the iterative_rate should be increased to get a better threshold.
Reopening as I cannot figure out how to correctly use LLMLingua without overshooting the target compression rate.
No matter how I set iterative_size, large prompts (2K+) are overcompressed.
I was evaluating how well the (Long)LLMLingua is able to achieve the requested compression rate (focusing on the
rate
parameter, nottarget_tokens
) and came to these conclusions:More detailed results are below.
My question is, am doing something wrong when invoking LLMLingua, or is this behaviour normal?
I adhered to the usage examples in README.md:
Code snippet
I tested with the default Llama 2 7b as well as with GPT-2. It seems that with the smaller model the deviation overall is smaller than with the bigger model.
(Prompt lengths measured using the GPT-3.5 tokenizer)
LLMLingua with Llama 2
LLMLingua with GPT-2
LongLLMLingua with Llama 2
LongLLMLingua with GPT-2
In contrast, LLMLingua-2 adheres to the requested compression rate quite well, only slightly overshooting the requested rate:
LLMLingua-2
The prompts I used are truncated from the longest prompt in the LongBench GovReport task (link).
The text was updated successfully, but these errors were encountered: