[Question]: LongBench BM25 reproduce #161

JUNE515 · 2024-05-30T15:50:42Z

Describe the issue

I'm interested in your longllmlingua results on LongBench.
I reproduced LongBench BM25 2,000-token constraint using ChatGPT.
Unlike the your paper's results, the performance is too high.
trec task score is 72.5 and most of the other tasks are also high.
I would like to know how you produced the bm25 result.
I'll show you the parameter I used to reproduce bm25 so I'd appreciate it if you could tell me which one is different.
I use same split and parameters other tasks.(only q_format and first inst changing according to original LongBench config)

Thank you

first_inst="Please determine the type of the question below. Here are some examples of questions."
q_format="{input}"
question= q_format.format(input=input)
instruction=first_inst
contexts_list = df['ctxs'][i].split("\n")
contexts_list = [
"\n".join(contexts_list[ii : ii + 4]) for ii in range(0, len(contexts_list), 4)
]
compressed_prompt = llm_lingua.compress_prompt(
contexts_list,
instruction=instruction,
question=question,
target_token=1800,
condition_compare=True,
condition_in_question="after",
rank_method="bm25",
use_sentence_level_filter=False,
use_token_level_filter=False,
context_budget="+100",
dynamic_context_compression_ratio=0.4, # enable dynamic_context_compression_ratio
)

iofu728 · 2024-06-03T08:58:00Z

Hi @JUNE515, thanks for support in LLMLingua.
Thanks for your support in LLMLingua. I checked the parameters you used and found that your actual compression rate might be relatively low. You can refer to the following code:

compressed_prompt  = llm_lingua(
    contexts_list,
    "",
    question,
    target_token=2048,
    use_sentence_level_filter=True,
    condition_in_question="none",
    reorder_context=False,
    dynamic_context_compression_ratio=0,
    condition_compare=False,
    concate_question=False,
    context_budget="+0",
    use_demonstrate_level_filter=True,
    use_token_level_filter=False,
    rank_method="bm25",
    token_budget_ratio=1.0
)

JUNE515 · 2024-06-03T11:43:09Z

Thank for your respone @iofu728

I have one more question.
Also, I reproduced LongBench LongLLMLingua 2,000-token constraint using ChatGPT.
But I get summary task 22.0(5.4 low) / few shot task 65.1(4.2 low) / code 49.4 (7.2 low)
I think my result seems to be low because both the split method and parameter of the context are equal.

I apply same split method and parameter like your code.ipynb repobench-p example.
I would like to know how you produced the longllmlingua result.

Thank you

iofu728 · 2024-06-06T08:26:11Z

Hi @JUNE515, thanks for your support.

You can reference the LongBench script at https://github.com/microsoft/LLMLingua/blob/main/experiments/llmlingua2/evaluation/eval_longbench.py. Our experiments run in completion mode. For more details, you can refer to https://github.com/microsoft/LLMLingua/blob/main/Transparency_FAQ.md#how-to-reproduce-the-result-in-llmlingua-series-work.

JUNE515 added the question label May 30, 2024

iofu728 self-assigned this Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: LongBench BM25 reproduce #161

[Question]: LongBench BM25 reproduce #161

JUNE515 commented May 30, 2024

iofu728 commented Jun 3, 2024

JUNE515 commented Jun 3, 2024

iofu728 commented Jun 6, 2024

[Question]: LongBench BM25 reproduce #161

[Question]: LongBench BM25 reproduce #161

Comments

JUNE515 commented May 30, 2024

Describe the issue

iofu728 commented Jun 3, 2024

JUNE515 commented Jun 3, 2024

iofu728 commented Jun 6, 2024