-
Notifications
You must be signed in to change notification settings - Fork 19
WorkArena ‐ Human Evaluation Study
Welcome to the WorkArena human evaluation study.
Thank you for agreeing to participate in our study. Before continuing, please fill out:
- The consent form
- The demographics survey
If you are being provided a preconfigured laptop, skip this step
Open up a new terminal. Ask the session lead for the password and replace it in the following command (PASSWORD_HERE
). Then, run the command to setup the environment:
export SNOW_INSTANCE_URL="https://dev253956.service-now.com/" && \
export SNOW_INSTANCE_UNAME="admin" && \
export SNOW_INSTANCE_PWD='PASSWORD_HERE' && \
mkdir workarena-human-eval && cd workarena-human-eval && \
if [[ $(uname -m) == 'arm64' ]]; then \
curl -Ls https://micro.mamba.pm/api/micromamba/osx-arm64/latest | tar -xvj bin/micromamba; \
else \
curl -Ls https://micro.mamba.pm/api/micromamba/osx-64/latest | tar -xvj bin/micromamba; \
fi && \
export MAMBA_ROOT_PREFIX=$(pwd)/micromamba && \
eval "$(./bin/micromamba shell hook -s posix)" && \
micromamba env create -n "human-eval" python=3.12 --channel conda-forge -y && \
micromamba activate human-eval && \
pip install --upgrade browsergym-core==0.3.4 && \
pip install --upgrade https://alexdrouin.com/workarena.zip && \
playwright install && \
export PATH=$PATH:micromamba/envs/human-eval/bin/ && \
echo "Installation complete."
Then, run the following command to verify that everything works:
workarena-human-eval --email YOUR_EMAIL --log test.log --curriculum random
If you see something like this, it works. You can kill the program for now (CTRL-C).
Please listen to the following presentation, which is meant to introduce everyone to our project, the ServiceNow platform, and our evaluation console.
workarena_human_eval.mp4
Now, you will spend 15 minutes experimenting with a real ServiceNow instance. Feel free to navigate the menus, create entries in lists, fill out forms, delete records, etc.
- Navigate to: https://researchuicopilotdemo.service-now.com/
- Username:
admin
- Password:
Ask the session lead
Now that you've undergone our standardized training, you may start solving tasks from the benchmark.
*If you are participating in the May 30, 2024 session, please get your curriculum file here.
Rules Please help us standardize the study by following these rules:
- You may use a note-taking app to keep track of any information you need
- You can use any tool you want to make numerical calculations (e.g., calculator, python interpreter, etc.)
- You may use multiple tabs, but please use the provided button (+) to open them
- We will log the time taken to solve the tasks so please make sure to remain focussed as much as possible. If you need a break, kill the program and restart it when ready.
Run the following command to start the evaluation tool:
workarena-human-eval --email YOUR_EMAIL --log ~/LASTNAME.log --curriculum YOUR_CURRICULUM_FILE
Important Notes:
-
YOUR_CURRICULUM_FILE
should point to the curriculum file that was provided by the session lead. This contains the list of tasks assigned to you. - Checkpoints are made. You may kill the program at any time and restart from where you left off.
- It's normal that some tasks are long to load. This is due to database preparation.
- If, for any reason, the page does not properly or appears to have crashed, please restart the program.
If you get to this point, thank you so much! We are very grateful for your contribution and will make sure to acknowledge it in the paper. Please send the LASTNAME.log
file from your home
directory to [email protected]. You are done!