Revive Batch processing - Claude and GPT #40

lisafast · 2024-12-19T17:43:15Z

A fundamental part of our evaluation system is to run batches of questions.
I did have batch processing working for Sonnet but have not tried again since ContextService was added. I assume there will have to be some changes. Note that batch calls cause tags to be added so that the AI doesn't ask clarifying questions (base.js in systemPrompt tells it not to ask clarifying questions if evaluation tag is present).
I was never able to get batch processing working for ChatGPT although the files are there. Ideally should work on both.

With the tool changes to the API, and with the Context Service added, need to get batch processing running again for Claude. Batches are started from the admin page - by loading a file like these ones (Use one you've downloaded and cleaned from the Feedback viewer, or any CSV file with a column labelled 'Problem Details' with the questions and an optional URL column with a referring URL. Admin code is required to enable file upload (temporary fix for testing).) https://docs.anthropic.com/en/docs/build-with-claude/message-batches
Then get it running for GPT. https://platform.openai.com/docs/guides/batch

Top40-findability-FR.csv
Top40-findability.csv

ryanhyma · 2024-12-24T16:07:58Z

Hi Lisa,

I have a couple of questions comments.

We can go one of two directions with this:

We could make the ContextService a tool for the main LLM agent, and the first step would always be to get the context, this would be supported for batch processing from Claude and OpenAI, they both support tools AFAIK.
The other option would be to batch process, alld the contexts for the questions first, output a csv that has the Problem Details and the Context, then feed this into another batch process that then receives the answers to the questions.

Let me know.

I just pushed a branch that allows you to run evaluations (without the batch checked), I've integrated that process with the context service. It seems to be working well. Do you view the results in the chat logs? Or I think in another issue you want them logged to the database?

lisafast · 2024-12-26T15:45:42Z

I'd rather keep context service separate, so we can decide if and when it's useful and be able to have better control. We could even have a failover where if it takes too long, we abandon it. At the moment, it IS helping load departmental results, but it also seems to have degraded some top task results.

Notice the sample files attached above with 40 top questions

I can only view batch jobs (which I believe are distinct from evaluations in some llm systems) right now by signing in and getting them from the Anthropic Console.. I've attached the sample you created on Dec23 - saved in plain txt bc I couldn't attach JSON to this post.

Yes issue #41 is about logging them to the database, which is very much needed for the evaluation process.

One thing that confuses me about the evaluation file is that the question itself (from the problem details column in the input file) is not in the output, and neither is the referring url (if provided). Those were included previously in the batch output. Ideally it should also include the evaluation tag so that batches can be distinguished from user output.

msgbatch_01X6PuvLdDrjzmfMHJpA57H7_results.txt

ryanhyma · 2024-12-30T21:06:41Z

okay, I'll update the process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revive Batch processing - Claude and GPT #40

Revive Batch processing - Claude and GPT #40

lisafast commented Dec 19, 2024

ryanhyma commented Dec 24, 2024

lisafast commented Dec 26, 2024

ryanhyma commented Dec 30, 2024

Revive Batch processing - Claude and GPT #40

Revive Batch processing - Claude and GPT #40

Comments

lisafast commented Dec 19, 2024

ryanhyma commented Dec 24, 2024

lisafast commented Dec 26, 2024

ryanhyma commented Dec 30, 2024