Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revive Batch processing - Claude and GPT #40

Open
lisafast opened this issue Dec 19, 2024 · 3 comments
Open

Revive Batch processing - Claude and GPT #40

lisafast opened this issue Dec 19, 2024 · 3 comments

Comments

@lisafast
Copy link
Owner

A fundamental part of our evaluation system is to run batches of questions.
I did have batch processing working for Sonnet but have not tried again since ContextService was added. I assume there will have to be some changes. Note that batch calls cause tags to be added so that the AI doesn't ask clarifying questions (base.js in systemPrompt tells it not to ask clarifying questions if evaluation tag is present).
I was never able to get batch processing working for ChatGPT although the files are there. Ideally should work on both.

  1. With the tool changes to the API, and with the Context Service added, need to get batch processing running again for Claude. Batches are started from the admin page - by loading a file like these ones (Use one you've downloaded and cleaned from the Feedback viewer, or any CSV file with a column labelled 'Problem Details' with the questions and an optional URL column with a referring URL. Admin code is required to enable file upload (temporary fix for testing).) https://docs.anthropic.com/en/docs/build-with-claude/message-batches

  2. Then get it running for GPT. https://platform.openai.com/docs/guides/batch

Top40-findability-FR.csv
Top40-findability.csv

@ryanhyma
Copy link
Collaborator

Hi Lisa,

I have a couple of questions comments.

We can go one of two directions with this:

  1. We could make the ContextService a tool for the main LLM agent, and the first step would always be to get the context, this would be supported for batch processing from Claude and OpenAI, they both support tools AFAIK.

  2. The other option would be to batch process, alld the contexts for the questions first, output a csv that has the Problem Details and the Context, then feed this into another batch process that then receives the answers to the questions.

Let me know.

I just pushed a branch that allows you to run evaluations (without the batch checked), I've integrated that process with the context service. It seems to be working well. Do you view the results in the chat logs? Or I think in another issue you want them logged to the database?

@lisafast
Copy link
Owner Author

I'd rather keep context service separate, so we can decide if and when it's useful and be able to have better control. We could even have a failover where if it takes too long, we abandon it. At the moment, it IS helping load departmental results, but it also seems to have degraded some top task results.

Notice the sample files attached above with 40 top questions

I can only view batch jobs (which I believe are distinct from evaluations in some llm systems) right now by signing in and getting them from the Anthropic Console.. I've attached the sample you created on Dec23 - saved in plain txt bc I couldn't attach JSON to this post.

Yes issue #41 is about logging them to the database, which is very much needed for the evaluation process.

One thing that confuses me about the evaluation file is that the question itself (from the problem details column in the input file) is not in the output, and neither is the referring url (if provided). Those were included previously in the batch output. Ideally it should also include the evaluation tag so that batches can be distinguished from user output.

image
msgbatch_01X6PuvLdDrjzmfMHJpA57H7_results.txt

@ryanhyma
Copy link
Collaborator

okay, I'll update the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants