-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade openhands-aci to 0.1.7 #6123
base: main
Are you sure you want to change the base?
Conversation
@mamoodi seems the eval job failed again :( any idea? |
Aha, I was just asking about eval https://github.com/All-Hands-AI/openhands-aci/pull/45/files#r1905796005 I'm curious if it shows anything. |
Hey team. Let me get back to this. You can't run on a fork unfortunately (permission access to secrets). |
Triggered eval 30 instances. |
When running via the UI I see the hidden count is shown, but looking into the output of openai__claude-3-5-sonnet-20241022-1736272164.5043015.json Here's the repo at the base commit, which includes hidden dirs e.g. |
@mamoodi in the evaluation job did we run a |
@ryanhoangt Just a quick thought: I see here that the hidden message is added as a second element in the list, not a continuation of the first string. Is that necessary? Maybe it should be part of a single string, it seems like otherwise we lose it somewhere along the way where the code assumes there can be only one element (maybe in the agent?) |
@enyst Can you elaborate it a bit, maybe with an example? I'm not sure I'm understanding your concern 😅 Here's what the output looks like, which makes sense to me actually:
|
No worries, it's not a concern, it was just a guess as to why it may "lose" the second string along the way. Looks like it was a bad guess. It seems you found the actual issue! |
Ran an eval on the 30 instances above locally, the result looks reasonable (baseline got 13/30). CC @xingyaoww |
@ryanhoangt is this the result AFTER we fixed the ordering issue? |
No, the ordering fix doesn't go into this release. This only contains your fix |
Can we bring in the ordering fix too? We can directly bump this to 0.1.8 |
End-user friendly description of the problem this fixes or functionality that this introduces
Give a summary of what the PR does, explaining any non-trivial design decisions
This PR is to:
CC: @xingyaoww
Link of any specific issues this addresses