Skip to content

Actions: EleutherAI/lm-evaluation-harness

Unit Tests

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
2,844 workflow runs
2,844 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

add llama3 tasks
Unit Tests #4072: Pull request #2556 synchronize by baberabb
January 21, 2025 17:27 7m 18s llama
January 21, 2025 17:27 7m 18s
add llama3 tasks
Unit Tests #4071: Pull request #2556 synchronize by baberabb
January 21, 2025 17:22 7m 9s llama
January 21, 2025 17:22 7m 9s
Fix max_tokens handling in vllm_vlms.py (#2637)
Unit Tests #4070: Commit 370e2f9 pushed by baberabb
January 21, 2025 16:55 6m 59s main
January 21, 2025 16:55 6m 59s
aggregate by group (total and categories) (#2643)
Unit Tests #4069: Commit b2c090c pushed by baberabb
January 21, 2025 16:48 7m 37s main
January 21, 2025 16:48 7m 37s
revise mbpp prompt (#2645)
Unit Tests #4068: Commit ed9c6fc pushed by baberabb
January 21, 2025 16:46 7m 33s main
January 21, 2025 16:46 7m 33s
revise mbpp prompt
Unit Tests #4067: Pull request #2645 opened by bzantium
January 21, 2025 04:56 6m 59s feature/#2644
January 21, 2025 04:56 6m 59s
aggregate by group (total and categories)
Unit Tests #4066: Pull request #2643 opened by bzantium
January 21, 2025 01:22 7m 30s feature/#2640
January 21, 2025 01:22 7m 30s
aggregate by group (total and categories)
Unit Tests #4065: Pull request #2642 opened by bzantium
January 21, 2025 01:17 7m 26s feature/#2640
January 21, 2025 01:17 7m 26s
Easily evaluate models steered by SAEs
Unit Tests #4064: Pull request #2641 opened by AMindToThink
January 21, 2025 01:05 Action required AMindToThink:sae_steered
January 21, 2025 01:05 Action required
MMLU Pro Plus
Unit Tests #4063: Pull request #2366 synchronize by baberabb
January 20, 2025 21:35 6m 51s asgsaeid:mmlu-pro-plus
January 20, 2025 21:35 6m 51s
fixed mmlu generative response extraction (#2503)
Unit Tests #4062: Commit 12b6eeb pushed by baberabb
January 20, 2025 21:33 7m 17s main
January 20, 2025 21:33 7m 17s
fix tmlu tmlu_taiwan_specific_tasks tag (#2420)
Unit Tests #4061: Commit 8814407 pushed by baberabb
January 20, 2025 21:16 7m 9s main
January 20, 2025 21:16 7m 9s
Update KorMedMCQA: ver 2.0 (#2540)
Unit Tests #4060: Commit ff2c49f pushed by baberabb
January 20, 2025 21:05 7m 36s main
January 20, 2025 21:05 7m 36s
Unit Tests
Unit Tests #4059: by baberabb
January 20, 2025 21:04 6m 53s main
January 20, 2025 21:04 6m 53s
fixed mmlu generative response extraction
Unit Tests #4058: Pull request #2503 synchronize by baberabb
January 20, 2025 21:03 6m 8s RawthiL:mmlu_generative_fix
January 20, 2025 21:03 6m 8s
fixed mmlu generative response extraction
Unit Tests #4057: Pull request #2503 synchronize by baberabb
January 20, 2025 20:58 6m 22s RawthiL:mmlu_generative_fix
January 20, 2025 20:58 6m 22s
fixed mmlu generative response extraction
Unit Tests #4056: Pull request #2503 synchronize by baberabb
January 20, 2025 20:57 6m 33s RawthiL:mmlu_generative_fix
January 20, 2025 20:57 6m 33s
fixed mmlu generative response extraction
Unit Tests #4055: Pull request #2503 synchronize by baberabb
January 20, 2025 20:52 6m 4s RawthiL:mmlu_generative_fix
January 20, 2025 20:52 6m 4s
fixed mmlu generative response extraction
Unit Tests #4054: Pull request #2503 synchronize by baberabb
January 20, 2025 20:48 5m 43s RawthiL:mmlu_generative_fix
January 20, 2025 20:48 5m 43s
New arabicmmlu (#2541)
Unit Tests #4053: Commit 6dac8c6 pushed by baberabb
January 20, 2025 20:46 6m 31s main
January 20, 2025 20:46 6m 31s
fix multiple input chat tempalte
Unit Tests #4052: Pull request #2576 synchronize by baberabb
January 20, 2025 20:43 5m 59s multiple_input
January 20, 2025 20:43 5m 59s
add hrm8k benchmark for both Korean and English (#2627)
Unit Tests #4051: Commit a5c344c pushed by baberabb
January 20, 2025 20:38 6m 33s main
January 20, 2025 20:38 6m 33s
add hrm8k benchmark for both Korean and English
Unit Tests #4050: Pull request #2627 synchronize by baberabb
January 20, 2025 20:30 6m 5s feature/#2623
January 20, 2025 20:30 6m 5s
Fix max_tokens handling in vllm_vlms.py
Unit Tests #4049: Pull request #2637 synchronize by baberabb
January 20, 2025 18:16 6m 1s jkaniecki:Fix_vllm_vlms_max_tokens
January 20, 2025 18:16 6m 1s