Skip to content

Commit

Permalink
chore: tp eval
Browse files Browse the repository at this point in the history
  • Loading branch information
zhudotexe committed Dec 11, 2024
1 parent 5eec0a4 commit edfe2ad
Show file tree
Hide file tree
Showing 13 changed files with 939 additions and 221 deletions.
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,6 @@ This will output a `score.json` file in the output path with the final scores.
python bench_travelplanner.py <full|root-fc|baseline|small-leaf|small-all|small-baseline>
```

Note: This benchmark does not test the `short-ctx` systems since this benchmark doesn't have a long-context requirement.

**Evaluate**

```shell
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

90 changes: 45 additions & 45 deletions experiments/travelplanner/openai/baseline/results_for_tp_eval.jsonl

Large diffs are not rendered by default.

32 changes: 16 additions & 16 deletions experiments/travelplanner/openai/full/results_for_tp_eval.jsonl

Large diffs are not rendered by default.

108 changes: 54 additions & 54 deletions experiments/travelplanner/openai/root-fc/results_for_tp_eval.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

0 comments on commit edfe2ad

Please sign in to comment.