Skip to main content

Main package for developing agents and experiments

Project description

AgentLab is a framework for developing and evaluating agents on a variety of benchmarks supported by BrowserGym. This includes:

  • WebArena
  • WorkArena.L1, L2, L3
  • VisualWebArena (coming soon...)
  • MiniWoB

The framework enables the desing of rich hyperparameter spaces and the launch of parallel experiments using ablation studies or random searches. It also provides agent_xray, a visualization tool to inspect the results of the experiments using a custom gradio interface

Install agentlab

This repo is intended for testing and developing new agents, hence we clone and install using the -e flag.

git clone git@github.com:ServiceNow/AgentLab.git
pip install -e .

Set Environment Variables

export AGENTLAB_EXP_ROOT=<root directory of experiment results>  # defaults to $HOME/agentlab_results
export OPENAI_API_KEY=<your openai api key> # if openai models are used
export HUGGINGFACEHUB_API_TOKEN=<your huggingfacehub api token> # if huggingface models are used

Use an assistant to work for you (at your own cost and risk)

agentlab-assistant --start_url https://www.google.com

Prepare Benchmarks

Depending on which benchmark you use, there are some prerequisites

MiniWoB
export MINIWOB_URL="file://$HOME/dev/miniwob-plusplus/miniwob/html/miniwob/"
WorkArena

See detailed instructions on workarena github

At a glance:

  1. Sign in and reqeuest a washington instance.

  2. Once the instance is ready, you should see <your instance URL> and <your-instance-password>

  3. Add these to your .bashrc (or .zshrc) and source it (note: make sure that all variables are in single quotes unless you happen to have a password with a single quote in it)

    export SNOW_INSTANCE_URL='https://<your-instance-number>.service-now.com/'
    export SNOW_INSTANCE_UNAME='admin'
    export SNOW_INSTANCE_PWD='<your-instance-password>'
    
  4. finally run these commands:

    pip install browsergym-workarena
    playwright install
    workarena-install
    
WebArena on AWS TODO
WebArena on Azure TODO

Launch experiments

Create your agent or import an existing one:

from agentlab.agents.generic_agent.agent_configs import AGENT_4o

Run the agent on a benchmark:

study_name, exp_args_list = run_agents_on_benchmark(AGENT_4o, benchmark)
study_dir = make_study_dir(RESULTS_DIR, study_name)
run_experiments(n_jobs, exp_args_list, study_dir)

use main.py to launch experiments with a variety of options. This is like a lazy CLI that is actually more convenient than a CLI. Just comment and uncomment the lines you need or modify at will (but don't push to the repo).

Debugging

For debugging, run experiments using n_jobs=1 and use VSCode debug mode. This will allow you to stop on breakpoints. To prevent the debugger from stopping on errors when running multiple experiments directly in VSCode, set enable_debug = False in ExpArgs

Parallel jobs

Running one agent on one task correspond to one job. When conducting ablation studies or random searches on hundreds of tasks with multiple seeds, this can lead to more than 10000 jobs. It is thus crucial to execute them in parallel. The agent usually wait on the LLM server to return the results or the web server to update the page. Hence, you can run 10-50 jobs in parallel on a single computer depending on how much RAM is available.

AgentXray

While your experiments are running, you can inspect the results using:

agentlab-xray

You will be able to select the recent experiments in the directory AGENTLAB_EXP_ROOT and visualize the results in a gradio interface.

In the following order, select:

  • The experiment you want to visualize
  • The agent if there is more than one
  • The task
  • And the seed

Once this is selected, you can see the trace of your agent on the given task. Click on the profiling image to select a step and observe the action taken by the agent.

Implement a new Agent

Get inspiration from the MostBasicAgent in agentlab/agents/most_basic_agent/most_basic_agent.py

Create a new directory in agentlab/agents/ with the name of your agent.

Misc

if you want to download HF models more quickly

pip install hf-transfer
pip install torch
export HF_HUB_ENABLE_HF_TRANSFER=1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentlab-0.3.0.tar.gz (83.6 kB view details)

Uploaded Source

Built Distribution

agentlab-0.3.0-py3-none-any.whl (94.0 kB view details)

Uploaded Python 3

File details

Details for the file agentlab-0.3.0.tar.gz.

File metadata

  • Download URL: agentlab-0.3.0.tar.gz
  • Upload date:
  • Size: 83.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for agentlab-0.3.0.tar.gz
Algorithm Hash digest
SHA256 adbd9b8e4bcd058f9e66036071909b2608c6104812c2dca4013f9d4e3b97f26d
MD5 4b59525a1041f4110731d682a3cd0ac2
BLAKE2b-256 5831f8f6ebdba4f0909fccc26698043a8ab56bc670e6ba0176fabc4589961a6e

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentlab-0.3.0.tar.gz:

Publisher: pypi.yml on ServiceNow/AgentLab

Attestations:

File details

Details for the file agentlab-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: agentlab-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 94.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for agentlab-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f57e14ee7f7c4f977a608c1a6740180796c82fbb4770213af39c9c8a574000d9
MD5 e111021ea20b8bb9b143494596795059
BLAKE2b-256 f145d1f46f55ee1847f6bf794865a2f345a69ac0f57f17f266b0ef1968c66037

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentlab-0.3.0-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/AgentLab

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page