Skip to main content

Agent-as-Annotators: Structured Distillation of Web Agent Capabilities

Project description

This repository contains the code for the A3 framework, which uses LLMs to systematically generate synthetic web agent training data by decomposing the annotation process into three roles: Task Designer, Annotator, and Supervisor.

Installation

pip install agent-as-annotators

Or install from source:

git clone https://github.com/McGill-NLP/agent-as-annotators.git
cd agent-as-annotators
pip install -e .

Quick Start: Evaluation

1. Serve a model with vLLM

vllm serve --config configs/vllm/Qwen3.5-9B.yaml

2. Run evaluation

a3-eval --benchmark webarena_test --model A3-qwen3.5-9b

Pipeline: Generating A3-Synth

The A3 pipeline generates synthetic training data in 5 steps:

Step 1: Create personas

python scripts/create_personas.py

Step 2: Generate task intents (via exploration)

a3-explore
python scripts/generate_task_intents.py

Step 3: Create A3-Synth task configs

python scripts/create_synth_configs.py

Step 4: Collect trajectories

a3-synth --benchmark a3_synth --model gemini-3-pro

Step 5: Convert to training data

python scripts/convert_trajectories_to_json.py
python scripts/generate_rft_data.py

Training

a3-train --config configs/train/qwen3.5-9b.json

Training uses SFT with FSDP for multi-GPU parallelism. See configs/train/ for hyperparameters and configs/accelerate/ for FSDP configuration.

CLI Commands

Command Description
a3-eval Run evaluation on WebArena, VisualWebArena, WorkArena, MiniWoB
a3-synth Run trajectory collection for A3-Synth
a3-explore Run environment exploration
a3-train Fine-tune a model with SFT
a3-screen-utils Screen session management utilities

Project Structure

agent-as-annotators/
  agent_as_annotators/       # Core package
    cli/                     # CLI entry points (eval, synth, explore, train)
    modeling.py              # Agent model wrapper (vLLM, Gemini, OpenAI)
    prompts/                 # All prompt templates
    judge/                   # Inverted evaluation protocol (Judge module)
    benchmarks/a3_synth/     # A3-Synth benchmark registration
    exploration/             # Exploration task registration
    utils/                   # Utilities
    configs/a3_synth/        # A3-Synth task configurations
  configs/
    model_configs.json       # Model registry
    train/                   # Training hyperparameters
    vllm/                    # vLLM serving configs
    accelerate/              # FSDP configs
  scripts/                   # Data pipeline scripts

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_as_annotators-0.1.0.tar.gz (72.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_as_annotators-0.1.0-py3-none-any.whl (83.2 kB view details)

Uploaded Python 3

File details

Details for the file agent_as_annotators-0.1.0.tar.gz.

File metadata

  • Download URL: agent_as_annotators-0.1.0.tar.gz
  • Upload date:
  • Size: 72.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_as_annotators-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0dc96976826a2f4f77efa9215846592d0e85828b35e0dc6fb099f7d1316cda5f
MD5 063de4070a28ab473647f7a68b798f26
BLAKE2b-256 c9adc4d75e07dc02944d763981d20c66c94a349b840bda0a5fbc71304346fcc7

See more details on using hashes here.

File details

Details for the file agent_as_annotators-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_as_annotators-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebaf6fe8a7805b332109aa17b51faffefe9a50257ffd15ad758300c13002d6ff
MD5 71fb46d70bec334521e009b0138645c0
BLAKE2b-256 ce45b56b98b2dd096007f8d77f7d82b397a7573ab9f4b7d4f7c8d6fb8b060668

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page