You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of #195 I noticed prompts smaller than iterative_size were not being compressed. The iterative_size parameter is 200 by default, causing the algorithm to ignore all tokens at certain prompt lengths below 200.
To be exact, it effectively means the following by default:
Prompts below 66 tokens are not compressed at all
For prompts between 66 and 98 tokens, compression starts working incresingly
At 99 tokens, compression drops back to none and it starts increasing again as prompt length increases to 200
Token lengths here are those produced by the compression model tokenizer.
Not sure if this is a bug or not, @iofu728?
The exact behaviour can be seen in this graph, an explanation is below.
--
(From #61 I understand that iterative_size determines the length of the segments $s \in S$ from Eq. (5)?)
Why this is likely a bug:
Assume a 50 token prompt. First, end is set to the length of the prompt (compressed_input_ids is the original prompt here).
In the get_compressed_input() call, the end parameter is set to end - iterative_size + delta_end (delta_end being the prompt length + 2 here), resulting in a value of -98.
In get_compressed_input(), the need_idx[end:] = 1 operation then causes all tokens to be kept (since end is -98), ignoring the result of the thresholding (need_idx signifies which tokens should be kept).
Normally, the operations in lines 1426 and 1427 limit compression to the segment that is currently being processed in the Iterative Token-level Prompt Compression algorithm. As demonstrated, this breaks the algorithm when the prompt is smaller than iterative_size.
The x-axis is prompt length, y-axis signifies how many tokens of the prompt are considered for compression.
The $i$ variable is iterative_size
$s(x)$ is the initial value of end in iterative_compress_prompt()
$d$ is delta_end (and iterative_size inside get_compressed_input())
$f(x)$ is the end parameter in get_compressed_input()
$g(x)$ yields the number of tokens considered for compression after need_idx[end:] = 1
Only $g(x)$ is displayed here.
Steps to reproduce
You can reproduce this using the official LLMLingua demo, trying to compress the following context of length 100 with target_token set to -1 and ratio to 0.5 (question and instruction left empty):
This report provides background information and issues for Congress regarding China's actions in the South China Sea (SCS) and East China Sea (ECS), with a focus on implications for U.S. strategic and policy interests. Other CRS reports focus on other aspects of maritime territorial disputes involving China. The issue for Congress is how the United States should respond to China's actions in the SCS and ECS—particularly China's island-building and base-construction activities in the Spratly
The actual compression ratio will be 1.0x. If you now remove the last word, "Spratly", suddenly compression works and the result is 2x compressed.
This is because now the token count dropped to 98 where, as previously mentioned, compression fully works. If you further reduce the number of words, the compression ratio will again fall, reaching 1.0x around 66 tokens prompt length.
How to fix:
I suppose a possible fix would be adding the following at the beginning of get_compressed_input():
ifend<iterative_size:
end=iterative_size
This way, prompts shorter than iterative_size are still compressed. I don't think this introduces sideeffects for other cases, as end shouldn't be smaller than iterative_size other than in this specific case.
means that there will be a remaining segment at the end of a prompt that will be ignored in the compression, if prompt length is not divisible by iterative_size. This is because end is incremented by iterative_size after each iteration and if the size of the remaining segment is smaller than iterative_size it will be ignored.
An example for a prompt of length 500, rate 0.5 (at least 3 iterations would be needed to process all tokens):
First iteration: the first 200 tokens are compressed to 100 tokens, the prompt length is now 400; the get_compressed_input() call sets end to 100, which is then incremented to 300 in line 1742.
Second iteration: the next 200 tokens are compressed to 100 tokens, the prompt length is now 300; the get_compressed_input() call sets end to 200, which is then incremented to 400.
Third iteration: there is no third iteration since the prompt length is 300 which is smaller than end, which is now 400. Therefore the last 100 tokens are not processed and left uncompressed.
In this case, 20% of the prompt are ignored completely. This graph shows how the ignored percentage of the prompt changes with size https://www.desmos.com/calculator/8kohofzyb5
The effect of this can also be seen in the results of #195 where the achieved compression ratio of prompts between 250 and 750 tokens deviates quite a bit from the expected ratio, presumably because significant portions of the original prompts were ignored in the compression process:
Of course this can be diminished by setting a smaller iterative_size, but even with the default value there should be a way to process the remaining tokens at the end of the prompt?
I don't have a solution here, as the algorithm breaks if you simply do one more iteration and I don't have time to look into a proper solution...
Expected Behavior
Prompts should be compressed even when smaller than iterative_size
Logs
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered:
cornzz
changed the title
[Bug]: Prompts smaller than iterative_size are not compressed at all
[Bug]: Prompts smaller than iterative_size are not compressed
Nov 14, 2024
cornzz
added a commit
to cornzz/LLMLingua
that referenced
this issue
Nov 14, 2024
Describe the bug
This concerns only LLMLingua / LongLLMLingua.
As part of #195 I noticed prompts smaller than
iterative_size
were not being compressed. Theiterative_size
parameter is 200 by default, causing the algorithm to ignore all tokens at certain prompt lengths below 200.To be exact, it effectively means the following by default:
Token lengths here are those produced by the compression model tokenizer.
Not sure if this is a bug or not, @iofu728?
The exact behaviour can be seen in this graph, an explanation is below.
--$s \in S$ from Eq. (5)?)
(From #61 I understand that
iterative_size
determines the length of the segmentsWhy this is likely a bug:
Assume a 50 token prompt. First,
end
is set to the length of the prompt (compressed_input_ids
is the original prompt here).LLMLingua/llmlingua/prompt_compressor.py
Line 1561 in 2dbdbd3
In the
get_compressed_input()
call, theend
parameter is set toend - iterative_size + delta_end
(delta_end
being the prompt length + 2 here), resulting in a value of-98
.LLMLingua/llmlingua/prompt_compressor.py
Lines 1724 to 1729 in 2dbdbd3
In
get_compressed_input()
, theneed_idx[end:] = 1
operation then causes all tokens to be kept (sinceend
is -98), ignoring the result of the thresholding (need_idx
signifies which tokens should be kept).LLMLingua/llmlingua/prompt_compressor.py
Lines 1424 to 1428 in 2dbdbd3
Normally, the operations in lines 1426 and 1427 limit compression to the segment that is currently being processed in the Iterative Token-level Prompt Compression algorithm. As demonstrated, this breaks the algorithm when the prompt is smaller than
iterative_size
.Graph explanation
The exact behaviour, how much of the prompt is considered for compression at different prompt lengths, can be seen here:
https://www.desmos.com/calculator/d7dbbqsdbv
The x-axis is prompt length, y-axis signifies how many tokens of the prompt are considered for compression.
iterative_size
end
initerative_compress_prompt()
delta_end
(anditerative_size
insideget_compressed_input()
)end
parameter inget_compressed_input()
need_idx[end:] = 1
Only$g(x)$ is displayed here.
Steps to reproduce
You can reproduce this using the official LLMLingua demo, trying to compress the following context of length 100 with
target_token
set to -1 andratio
to 0.5 (question
andinstruction
left empty):The actual compression ratio will be 1.0x.
If you now remove the last word, "Spratly", suddenly compression works and the result is 2x compressed.
This is because now the token count dropped to 98 where, as previously mentioned, compression fully works. If you further reduce the number of words, the compression ratio will again fall, reaching 1.0x around 66 tokens prompt length.
How to fix:
I suppose a possible fix would be adding the following at the beginning of
get_compressed_input()
:This way, prompts shorter than
iterative_size
are still compressed. I don't think this introduces sideeffects for other cases, asend
shouldn't be smaller thaniterative_size
other than in this specific case.This screenshot shows the behaviour with this fix,$n(x)$ being the new
end
:https://www.desmos.com/calculator/69cm0iasqz
Semi-related (bug?)
This line
LLMLingua/llmlingua/prompt_compressor.py
Line 1586 in 2dbdbd3
means that there will be a remaining segment at the end of a prompt that will be ignored in the compression, if prompt length is not divisible by
iterative_size
. This is becauseend
is incremented byiterative_size
after each iteration and if the size of the remaining segment is smaller thaniterative_size
it will be ignored.An example for a prompt of length 500, rate 0.5 (at least 3 iterations would be needed to process all tokens):
get_compressed_input()
call setsend
to 100, which is then incremented to 300 in line 1742.get_compressed_input()
call setsend
to 200, which is then incremented to 400.end
, which is now 400. Therefore the last 100 tokens are not processed and left uncompressed.In this case, 20% of the prompt are ignored completely. This graph shows how the ignored percentage of the prompt changes with size https://www.desmos.com/calculator/8kohofzyb5
The effect of this can also be seen in the results of #195 where the achieved compression ratio of prompts between 250 and 750 tokens deviates quite a bit from the expected ratio, presumably because significant portions of the original prompts were ignored in the compression process:
Of course this can be diminished by setting a smaller
iterative_size
, but even with the default value there should be a way to process the remaining tokens at the end of the prompt?I don't have a solution here, as the algorithm breaks if you simply do one more iteration and I don't have time to look into a proper solution...
Expected Behavior
Prompts should be compressed even when smaller than
iterative_size
Logs
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: