[Question]: How was edit sim for code tasks calculated? #207

cornzz · 2025-01-09T12:39:21Z

In the longbench evaluation fuzzywuzzy.ratio was used to calculate edit similarity (Levenshtein distance) for the predicted code lines (see here).

However I noticed that this function returns different results depending on whether python-Levenshtein, an optional dependency of fuzzywuzzy, is installed or not. Therefore I wanted to know if you used python-Levenshtein or not.

On another note, in both cases this library does not actually return the correct Levenshtein distance: THUDM/LongBench#96

The text was updated successfully, but these errors were encountered:

cornzz added the question Further information is requested label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: How was edit sim for code tasks calculated? #207

[Question]: How was edit sim for code tasks calculated? #207

cornzz commented Jan 9, 2025 •

edited

Loading

[Question]: How was edit sim for code tasks calculated? #207

[Question]: How was edit sim for code tasks calculated? #207

Comments

cornzz commented Jan 9, 2025 • edited Loading

cornzz commented Jan 9, 2025 •

edited

Loading