Skip to content

Actions: EleutherAI/lm-evaluation-harness

Unit Tests

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
2,842 workflow runs
2,842 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

separate category for global_mmlu (#2652)
Unit Tests #4096: Commit 5c006ed pushed by baberabb
January 24, 2025 16:00 7m 28s main
January 24, 2025 16:00 7m 28s
Add loncxt tasks
Unit Tests #4094: Pull request #2629 synchronize by baberabb
January 23, 2025 18:34 7m 28s longcxt
January 23, 2025 18:34 7m 28s
fix multiple input chat tempalte
Unit Tests #4093: Pull request #2576 synchronize by baberabb
January 23, 2025 15:59 7m 37s multiple_input
January 23, 2025 15:59 7m 37s
Add Moral Stories
Unit Tests #4092: Pull request #2653 opened by upunaprosk
January 23, 2025 14:31 7m 19s upunaprosk:moral_stories
January 23, 2025 14:31 7m 19s
Easily evaluate models steered by SAEs
Unit Tests #4091: Pull request #2641 synchronize by AMindToThink
January 23, 2025 03:50 Action required AMindToThink:sae_steered
January 23, 2025 03:50 Action required
separate category for global_mmlu
Unit Tests #4090: Pull request #2652 opened by bzantium
January 23, 2025 02:06 7m 5s feature/#2649
January 23, 2025 02:06 7m 5s
Add loncxt tasks
Unit Tests #4089: Pull request #2629 synchronize by baberabb
January 23, 2025 00:53 6m 56s longcxt
January 23, 2025 00:53 6m 56s
Add loncxt tasks
Unit Tests #4088: Pull request #2629 synchronize by baberabb
January 22, 2025 23:03 6m 55s longcxt
January 22, 2025 23:03 6m 55s
Add loncxt tasks
Unit Tests #4087: Pull request #2629 synchronize by baberabb
January 22, 2025 22:44 6m 40s longcxt
January 22, 2025 22:44 6m 40s
Add loncxt tasks
Unit Tests #4086: Pull request #2629 synchronize by baberabb
January 22, 2025 22:25 6m 52s longcxt
January 22, 2025 22:25 6m 52s
add TransformerLens example
Unit Tests #4085: Pull request #2651 opened by nickypro
January 22, 2025 17:55 7m 16s nickypro:patch-1
January 22, 2025 17:55 7m 16s
humaneval instruct
Unit Tests #4084: Pull request #2650 opened by baberabb
January 22, 2025 16:49 7m 2s humaneval_instruct
January 22, 2025 16:49 7m 2s
Easily evaluate models steered by SAEs
Unit Tests #4082: Pull request #2641 synchronize by AMindToThink
January 22, 2025 07:04 Action required AMindToThink:sae_steered
January 22, 2025 07:04 Action required
add llama3 tasks
Unit Tests #4081: Pull request #2556 synchronize by baberabb
January 22, 2025 00:16 6m 52s llama
January 22, 2025 00:16 6m 52s
add llama3 tasks
Unit Tests #4080: Pull request #2556 synchronize by baberabb
January 21, 2025 23:44 6m 45s llama
January 21, 2025 23:44 6m 45s
add llama3 tasks
Unit Tests #4079: Pull request #2556 synchronize by baberabb
January 21, 2025 23:38 6m 59s llama
January 21, 2025 23:38 6m 59s
add llama3 tasks
Unit Tests #4078: Pull request #2556 synchronize by baberabb
January 21, 2025 22:18 6m 57s llama
January 21, 2025 22:18 6m 57s
add llama3 tasks
Unit Tests #4077: Pull request #2556 synchronize by baberabb
January 21, 2025 22:08 7m 9s llama
January 21, 2025 22:08 7m 9s
add llama3 tasks
Unit Tests #4076: Pull request #2556 synchronize by baberabb
January 21, 2025 22:06 7m 44s llama
January 21, 2025 22:06 7m 44s
add llama3 tasks
Unit Tests #4075: Pull request #2556 synchronize by baberabb
January 21, 2025 22:06 7m 38s llama
January 21, 2025 22:06 7m 38s
add llama3 tasks
Unit Tests #4074: Pull request #2556 synchronize by baberabb
January 21, 2025 22:00 7m 31s llama
January 21, 2025 22:00 7m 31s
Easily evaluate models steered by SAEs
Unit Tests #4073: Pull request #2641 synchronize by AMindToThink
January 21, 2025 20:57 Action required AMindToThink:sae_steered
January 21, 2025 20:57 Action required
add llama3 tasks
Unit Tests #4072: Pull request #2556 synchronize by baberabb
January 21, 2025 17:27 7m 18s llama
January 21, 2025 17:27 7m 18s
add llama3 tasks
Unit Tests #4071: Pull request #2556 synchronize by baberabb
January 21, 2025 17:22 7m 9s llama
January 21, 2025 17:22 7m 9s