SafeArena is a benchmark for agent safety

Project description

SafeArena

🤗Dataset	📄Paper (TBA)	🌐Website
🏆Leaderboard	📦Environments	⏏️Submission (TBA)

SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur*, Nicholas Meade*, Xing Han Lù*, Alejandra Zambrano†, Arkil Patel†,
Esin Durmus, Spandana Gella, Karolina Stańczak, Siva Reddy
*Equal contribution, †Core Technical Contribution

Installation

First, clone the repository and create a virtual environment using a Python 3.10+ version:

git clone https://github.com/McGill-NLP/safearena.git

cd safearena/
python -m venv venv
source venv/bin/activate

Then, install the required packages:

# install the exact dependencies to reproduce the experiments
pip install -r requirements.txt

# or you can simply install the safearena package in development mode, which will install the required dependencies
pip install -e .

# Install playwright
playwright install

Task splits download

First, request access to the SafeArena dataset on the Hugging Face Hub. Once you have access, you can log in using the huggingface_hub CLI:

pip install huggingface-hub
huggingface-cli login

Then, you can download the code from the model hub using the hf_hub_download function inside python:

from huggingface_hub import hf_hub_download

# Download the safe.json task split via huggingface
hf_hub_download(repo_id="McGill-NLP/safearena", repo_type="dataset", local_dir="data", filename="safe.json")
# Download the harm.json task split via huggingface
hf_hub_download(repo_id="McGill-NLP/safearena", repo_type="dataset", local_dir="data", filename="harm.json")

You now have the required task splits in the relative data/ directory.

Experiments

API Keys and Base URLs as Environment Variables

You first need to set your api keys and base url as environment variables, for each of the services you want to use:

export OPENAI_ORG_ID="your-openai-org-id"

# API keys
export OPENAI_API_KEY="your-openai-api-key"
export TOGETHER_API_KEY="your-together-api-key"
export VLLM_API_KEY="your-vllm-api-key"
export OPENROUTER_API_KEY="your-openrouter-api-key"

export VLLM_BASE_URL="https://vllm.mcgill-nlp.com"
export TOGETHER_BASE_URL="https://api.together.xyz/v1"
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"

The OPENAI_ORG_ID is the organization id you are using for the OpenAI API. You can find it in the OpenAI dashboard. Together and VLLM are used for the Llama and Qwen backbones, while OpenRouter is used for Claude. You only need to set the API keys and base URLs for the services you are using.

Manually setting up environment variables

To decide the task, you need to set the env var SAFEARENA_TASK to one of the following:

# if you want to run the safe task  on human data...
export SAFEARENA_TASK="safe"
# ... or if you want to run the harmful task on human data...
export SAFEARENA_TASK="harm"

You also need to specify suffix and domain name:

export DOMAIN_NAME="your-domain.com"
export SUFFIX="aa-1"

Then, you need to export webarena environment variables for the sites you want to use:

export WA_HOMEPAGE="https://sa-homepage-${SUFFIX}.${DOMAIN_NAME}"
export WA_SHOPPING="https://sa-shopping-${SUFFIX}.${DOMAIN_NAME}/"
export WA_SHOPPING_ADMIN="https://sa-shopping-admin-${SUFFIX}.${DOMAIN_NAME}/admin"
export WA_REDDIT="https://sa-forum-${SUFFIX}.${DOMAIN_NAME}"
export WA_GITLAB="https://sa-gitlab-${SUFFIX}.${DOMAIN_NAME}"
export WA_FULL_RESET="https://sa-reset-${SUFFIX}.${DOMAIN_NAME}"
# Those are not functional sites but are emptily defined here for compatibility with browsergym
export WA_WIKIPEDIA="https://sa-wikipedia-${SUFFIX}.${DOMAIN_NAME}/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing"
export WA_MAP="https://sa-map-${SUFFIX}.${DOMAIN_NAME}"

Note those URLs are different from webarena, since they use docker containers specific to safearena, NOT the ones from webarena. Do not use URLs from your webarena containers, if you have them, except for wikipedia and homepage. Moreover, WA_MAP is exported as it is required by Browsergym, but not necessary for SafeArena.

[!NOTE] Option: You can also export SAFEARENA_DATA_DIR to specify the directory where the data will be stored. By default, it will be ./data.

Using pre-defined environment variables

You can also source from some pre-defined environment variables:

# the suffix indicates the user and the instance number
# for example, if you are user aa and you want to run on instance 1:
export DOMAIN_NAME="your-domain.com"
export SUFFIX="aa-1"

# if you want to run the "safe" task based on the SUFFIX:
source vars/safe-cf.sh

# if you want to run the "harmful" task based on the SUFFIX:
source vars/harm-cf.sh

Launching experiments

To run an experiment, use the scripts/launch_experiment.py script. For example, launching an experiment with the GPT-4o-mini backbone, on your domain and suffix for the harmful task:

export DOMAIN_NAME="your-domain.com"
export SUFFIX="aa-1"

source vars/harm-cf.sh
python scripts/launch_experiment.py --backbone gpt-4o-mini

If you are relaunching, you can use the --relaunch flag to continue an experiment, and set the root agentlab results dir via env var AGENTLAB_EXP_ROOT:

export AGENTLAB_EXP_ROOT="/path/to/agentlab/results"  # by default, it will be "~/agentlab_results"

# relaunch an experiment
python scripts/launch_experiment.py --backbone gpt-4o-mini --relaunch "<name_of_experiment>"

If you want to run the task in parallel, you can use ray:

python scripts/launch_experiment.py --backbone gpt-4o-mini --parallel ray -n 4

Reviewing experiments with agent-xray

To visualize the agent's behavior, you can use the agent_xray.py tool derived from agentlab:

python apps/agent_xray.py --results_dir "<path_to_results_dir>" --port "<port>"

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Feb 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safearena-0.1.0.tar.gz (12.2 kB view details)

Uploaded Feb 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

safearena-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Feb 26, 2025 Python 3

File details

Details for the file safearena-0.1.0.tar.gz.

File metadata

Download URL: safearena-0.1.0.tar.gz
Upload date: Feb 26, 2025
Size: 12.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for safearena-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`db518deeb4a98907d4a6d400746d49cb6e892cf01b3be5ddbbaf07fa71f482df`
MD5	`8d401fa499a2497365340de07f2ff586`
BLAKE2b-256	`d6d2784167dd0acf2780a564965b745e38a243c92e1624c87766b6b826238623`

See more details on using hashes here.

File details

Details for the file safearena-0.1.0-py3-none-any.whl.

File metadata

Download URL: safearena-0.1.0-py3-none-any.whl
Upload date: Feb 26, 2025
Size: 11.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for safearena-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`499cf4efbb0868f710e7c4dd8121d8ba6b7d5ca12de1d5760929899581a0c2fc`
MD5	`0fd01fcea2395496cc122c4714c21441`
BLAKE2b-256	`7d4f13e4611ce8981bbe0ed10dc13dfdd055918b97f32109629562466e743098`

See more details on using hashes here.

safearena 0.1.0

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Project description

SafeArena

Installation

Task splits download

Experiments

API Keys and Base URLs as Environment Variables

Manually setting up environment variables

Using pre-defined environment variables

Launching experiments

Reviewing experiments with agent-xray

Project details

Verified details

Owner

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes