Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Achieved compression rate with (Long)LLMLingua not meeting expectations? #195

Open
cornzz opened this issue Nov 14, 2024 · 3 comments
Labels
question Further information is requested

Comments

@cornzz
Copy link

cornzz commented Nov 14, 2024

I was evaluating how well the (Long)LLMLingua is able to achieve the requested compression rate (focusing on the rate parameter, not target_tokens) and came to these conclusions:

  • For smaller prompts (< 150 tokens) barely any compression can be achieved, if any at all
  • Requested compression rate is best achieved for prompts around 2000 tokens
  • For longer prompts (>5000 tokens) the requested rate is overshot (or undershot)

More detailed results are below.
My question is, am doing something wrong when invoking LLMLingua, or is this behaviour normal?
I adhered to the usage examples in README.md:

Code snippet
compressor = PromptCompressor(
    model_name="NousResearch/Llama-2-7b-hf",  # or "openai-community/gpt2"
    device_map="balanced"
)
...
def compress(prompt, rate, question=""):
    if longllmlingua:
        res = compressor.compress_prompt(
            [prompt],
            question=question,
            rate=rate,
            condition_in_question="after_condition",
            reorder_context="sort",
            dynamic_context_compression_ratio=0.3,
            condition_compare=True,
            rank_method="longllmlingua",
        )
    else:
        res = compressor.compress_prompt(prompt, rate=rate)
    return res

I tested with the default Llama 2 7b as well as with GPT-2. It seems that with the smaller model the deviation overall is smaller than with the bigger model.

(Prompt lengths measured using the GPT-3.5 tokenizer)

LLMLingua with Llama 2

Image

LLMLingua with GPT-2

Image

LongLLMLingua with Llama 2

Image

LongLLMLingua with GPT-2

Image

In contrast, LLMLingua-2 adheres to the requested compression rate quite well, only slightly overshooting the requested rate:

LLMLingua-2

Image

The prompts I used are truncated from the longest prompt in the LongBench GovReport task (link).

@cornzz cornzz added the question Further information is requested label Nov 14, 2024
@cornzz
Copy link
Author

cornzz commented Nov 14, 2024

-- Moved to separate issue: #196 --

@cornzz
Copy link
Author

cornzz commented Nov 15, 2024

I suppose I answered the question myself while investigating further:
Small prompts not being compressed was a bug, for bigger prompts I guess the rate is overshot because the iterative_rate should be increased to get a better threshold.

@cornzz cornzz closed this as completed Nov 15, 2024
@cornzz
Copy link
Author

cornzz commented Jan 3, 2025

Reopening as I cannot figure out how to correctly use LLMLingua without overshooting the target compression rate.
No matter how I set iterative_size, large prompts (2K+) are overcompressed.

@cornzz cornzz reopened this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant