Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is 1-shot / half-shot /quarter-shot constraint in experiments? #185

Open
21-10-4 opened this issue Sep 23, 2024 · 7 comments
Open
Assignees
Labels
question Further information is requested

Comments

@21-10-4
Copy link

21-10-4 commented Sep 23, 2024

我还是无法理解。1-shot constraint代表the original token(包含一个示例) ,half-shot constraint指什么,半个示例?

Originally posted by @21-10-4 in #164 (comment)

@21-10-4
Copy link
Author

21-10-4 commented Sep 23, 2024

非常期待回复,感谢

@cornzz
Copy link

cornzz commented Sep 27, 2024

I also want to know what the compression targets are for GSM8K / BBH for 1-shot / half-shot etc., what is the target_token?

I was also wondering what zero-shot means here, specifically for the LongBench benchmark:
I suppose its clear for the tasks where there is context and input given, in that case one would just leave context empty and only insert input in the prompt? But what about the summarization tasks, or the lcc task, where there is only context but no input at all?

@cornzz
Copy link

cornzz commented Sep 30, 2024

@iofu728 sorry for bothering, but what exactly is the definition of "zero-shot" in the context of the ZeroScrolls benchmark? As stated here, ZeroScrolls is already a zero-shot benchmark by itself:

"ZeroSCROLLS is a zero-shot benchmark for natural language understanding over long texts."

so I am confused why there is an extra row for "zero-shot" for the ZeroScrolls benchmark in Table 2?

Screenshot 2024-09-30 at 18 06 35

@dongziyu1016
Copy link

我还想知道零样本在这里是什么意思,特别是对于 LongBench 基准: 我认为对于有contextinput给定的任务来说,它很明显,在这种情况下,人们只需留空context并只插入input提示即可?但是对于总结任务,或者一开始lcc只有context和没有的任务呢?input

I also want to know that how do summarization tasks

@iofu728 iofu728 self-assigned this Oct 22, 2024
@iofu728 iofu728 added the question Further information is requested label Oct 22, 2024
@iofu728
Copy link
Contributor

iofu728 commented Oct 22, 2024

Hi @21-10-4, @cornzz, and @dongziyu1016, thanks for your questions, and apologies for the delayed response.

  1. "1-shot", "half-shot", and "quarter-shot" refer to the number of tokens used in the prompt. "1-shot" means only one example is retained, while "half-shot" and "quarter-shot" indicate that the compressed tokens are equivalent to half and one-quarter of the average tokens used by a demonstration, respectively.
  2. Zero-shot refers to not using any context or demonstrations beyond the question. For summarization, we retain 25 tokens before and after the document, while for LCC, we only retain the code context corresponding to the question.
  3. Apologies for the confusion. In ZeroScrolls, zero-shot means no context information is used. You can refer to the following code for further details:
    res = []
    for task in TASKS:
        dataset = load_dataset("tau/zero_scrolls", task)["validation"]
        for ii, jj in tqdm(enumerate(dataset), total=len(dataset)):
            (prompt, question), output = get_zero_scrolls(jj, task)
            if not question:
                question = encoding.decode(encoding.encode(prompt)[:200])
            res.append({"id": ii, "task": task, "prompt": question, "output": output})
    json.dump(res, open("prompt/zero_scrolls/zero_shot.json", "w"))

@cornzz
Copy link

cornzz commented Oct 26, 2024

@iofu728 thanks for your response! I have some follow-up questions:

"1-shot" means only one example is retained

I do not quite understand this, as in the results table for GSM8K for 1-shot constraint the value in the "Tokens" column for LLMLingua-2 is 457. However, the longest demonstration in the uncompressed CoT prompt (prompt_hardest.txt) is already only 429 tokens long, so it cannot be the case that 1-shot constraint actually means only one of the demonstration is retained? (And I assume that only the token counts for the CoT demonstrations is counted in the "Tokens" column, as the value for Full-Shot exactly corresponds to the token count of prompt_hardest.txt)

Zero-shot refers to not using any context or demonstrations beyond the question.

How exactly is the prompt built for the zero-shot case, is the {context} placeholder in the prompt template literally just filled with an empty string, so that e.g. the Narrative QA prompt is Story: \n\nNow, answer the question based on the story...? This leads to instruct models answering "There is no story provided.", not even attempting to generate some answer. Is this intended?

For summarization, we retain 25 tokens before and after the document

Could you clarify, does this mean you keep 25 tokens from the beginning of the context / document and 25 tokens from the end and cut out the middle?

while for LCC, we only retain the code context corresponding to the question

I do not understand, how do I find out which part of the code context corresponds to the question? The prompt template given in eval_longbench.py is "Please complete the code given below. \n{context}Next line of code:\n". Perhaps you meant the repobench-p task, where there also is a question field given for each sample containing the relevant code, while the context field only contains more code for context?
Could you also clarify how zero_shot works for all other tasks, but especially the following: Passage Count, Passage Retrieval?

What exactly does that mean, "no context information is used", especially for summary tasks, there has to be something to summarize? Given your code example, which I do not quite understand (what is get_zero_scrolls() and what exactly does it return?), it seems that at least 200 tokens are retained in case question is empty? Does that means there are 200 tokens of context retained for the zero-shot case?

@cornzz
Copy link

cornzz commented Nov 19, 2024

@pzs19 Hi, sorry for bothering, but I am still not clear how to reproduce the zero-shot results, it seems from the previous response of your co-author, that a different approach was used for each task, could you possibly clarify on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants