Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: LongBench BM25 reproduce #161

Open
JUNE515 opened this issue May 30, 2024 · 3 comments
Open

[Question]: LongBench BM25 reproduce #161

JUNE515 opened this issue May 30, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@JUNE515
Copy link

JUNE515 commented May 30, 2024

Describe the issue

I'm interested in your longllmlingua results on LongBench.
I reproduced LongBench BM25 2,000-token constraint using ChatGPT.
Unlike the your paper's results, the performance is too high.
trec task score is 72.5 and most of the other tasks are also high.
I would like to know how you produced the bm25 result.
I'll show you the parameter I used to reproduce bm25 so I'd appreciate it if you could tell me which one is different.
I use same split and parameters other tasks.(only q_format and first inst changing according to original LongBench config)

Thank you

first_inst="Please determine the type of the question below. Here are some examples of questions."
q_format="{input}"
question= q_format.format(input=input)
instruction=first_inst
contexts_list = df['ctxs'][i].split("\n")
contexts_list = [
"\n".join(contexts_list[ii : ii + 4]) for ii in range(0, len(contexts_list), 4)
]
compressed_prompt = llm_lingua.compress_prompt(
contexts_list,
instruction=instruction,
question=question,
target_token=1800,
condition_compare=True,
condition_in_question="after",
rank_method="bm25",
use_sentence_level_filter=False,
use_token_level_filter=False,
context_budget="+100",
dynamic_context_compression_ratio=0.4, # enable dynamic_context_compression_ratio
)

@JUNE515 JUNE515 added the question Further information is requested label May 30, 2024
@iofu728 iofu728 self-assigned this Jun 3, 2024
@iofu728
Copy link
Contributor

iofu728 commented Jun 3, 2024

Hi @JUNE515, thanks for support in LLMLingua.
Thanks for your support in LLMLingua. I checked the parameters you used and found that your actual compression rate might be relatively low. You can refer to the following code:

compressed_prompt  = llm_lingua(
    contexts_list,
    "",
    question,
    target_token=2048,
    use_sentence_level_filter=True,
    condition_in_question="none",
    reorder_context=False,
    dynamic_context_compression_ratio=0,
    condition_compare=False,
    concate_question=False,
    context_budget="+0",
    use_demonstrate_level_filter=True,
    use_token_level_filter=False,
    rank_method="bm25",
    token_budget_ratio=1.0
)

@JUNE515
Copy link
Author

JUNE515 commented Jun 3, 2024

Thank for your respone @iofu728

I have one more question.
Also, I reproduced LongBench LongLLMLingua 2,000-token constraint using ChatGPT.
But I get summary task 22.0(5.4 low) / few shot task 65.1(4.2 low) / code 49.4 (7.2 low)
I think my result seems to be low because both the split method and parameter of the context are equal.

I apply same split method and parameter like your code.ipynb repobench-p example.
I would like to know how you produced the longllmlingua result.

Thank you

@iofu728
Copy link
Contributor

iofu728 commented Jun 6, 2024

Hi @JUNE515, thanks for your support.

You can reference the LongBench script at https://github.com/microsoft/LLMLingua/blob/main/experiments/llmlingua2/evaluation/eval_longbench.py. Our experiments run in completion mode. For more details, you can refer to https://github.com/microsoft/LLMLingua/blob/main/Transparency_FAQ.md#how-to-reproduce-the-result-in-llmlingua-series-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants