WorkArena ‐ Human Evaluation Study

Welcome to the WorkArena human evaluation study.

Step 1: Fill out consent form and survey

Thank you for agreeing to participate in our study. Before continuing, please fill out:

The consent form
The demographics survey

Step 2: Setup your environment

If you are being provided a preconfigured laptop, skip this step

Open up a new terminal. Ask the session lead for the password and replace it in the following command (PASSWORD_HERE). Then, run the command to setup the environment:

export SNOW_INSTANCE_URL="https://dev253956.service-now.com/" && \
export SNOW_INSTANCE_UNAME="admin" && \
export SNOW_INSTANCE_PWD='PASSWORD_HERE' && \
mkdir workarena-human-eval && cd workarena-human-eval && \
if [[ $(uname -m) == 'arm64' ]]; then \
  curl -Ls https://micro.mamba.pm/api/micromamba/osx-arm64/latest | tar -xvj bin/micromamba; \
else \
  curl -Ls https://micro.mamba.pm/api/micromamba/osx-64/latest | tar -xvj bin/micromamba; \
fi && \
export MAMBA_ROOT_PREFIX=$(pwd)/micromamba && \
eval "$(./bin/micromamba shell hook -s posix)" && \
micromamba env create -n "human-eval" python=3.12 --channel conda-forge -y && \
micromamba activate human-eval && \
pip install --upgrade browsergym-core==0.3.4 && \
pip install --upgrade https://alexdrouin.com/workarena.zip && \
playwright install && \
export PATH=$PATH:micromamba/envs/human-eval/bin/ && \
echo "Installation complete."

Then, run the following command to verify that everything works:

workarena-human-eval --email YOUR_EMAIL --log test.log --curriculum random

If you see something like this, it works. You can kill the program for now (CTRL-C).

Step 3: Training: Introductory presentation

Please listen to the following presentation, which is meant to introduce everyone to our project, the ServiceNow platform, and our evaluation console.

workarena_human_eval.mp4

Step 4: Training: Self-exploration in a real instance

Now, you will spend 15 minutes experimenting with a real ServiceNow instance. Feel free to navigate the menus, create entries in lists, fill out forms, delete records, etc.

Navigate to: https://researchuicopilotdemo.service-now.com/
Username: admin
Password: Ask the session lead

Step 5: Human evaluation

Now that you've undergone our standardized training, you may start solving tasks from the benchmark.

*If you are participating in the May 30, 2024 session, please get your curriculum file here.

Rules Please help us standardize the study by following these rules:

You may use a note-taking app to keep track of any information you need
You can use any tool you want to make numerical calculations (e.g., calculator, python interpreter, etc.)
You may use multiple tabs, but please use the provided button (+) to open them
We will log the time taken to solve the tasks so please make sure to remain focussed as much as possible. If you need a break, kill the program and restart it when ready.

Run the following command to start the evaluation tool:

workarena-human-eval --email YOUR_EMAIL --log ~/LASTNAME.log --curriculum YOUR_CURRICULUM_FILE

Important Notes:

YOUR_CURRICULUM_FILE should point to the curriculum file that was provided by the session lead. This contains the list of tasks assigned to you.
Checkpoints are made. You may kill the program at any time and restart from where you left off.
It's normal that some tasks are long to load. This is due to database preparation.
If, for any reason, the page does not properly or appears to have crashed, please restart the program.

Step 6: Hand in your results

If you get to this point, thank you so much! We are very grateful for your contribution and will make sure to acknowledge it in the paper. Please send the LASTNAME.log file from your home directory to [email protected]. You are done!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly