Agent-as-Annotators: Structured Distillation of Web Agent Capabilities
Project description
Agent-as-Annotators (A3)
| 💾 Code | 📄 Paper | 🌐 Website |
|---|---|---|
| 🤗 Dataset | 🤖 Model | 📦 PyPI |
Structured Distillation of Web Agent Capabilities Enables Generalization
Xing Han Lù, Siva Reddy
This repository contains the code for the A3 framework, which uses LLMs to systematically generate synthetic web agent training data by decomposing the annotation process into three roles: Task Designer, Annotator, and Supervisor.
Installation
pip install agent-as-annotators
Or install from source:
git clone https://github.com/McGill-NLP/agent-as-annotators.git
cd agent-as-annotators
pip install -e .
Quick Start: Evaluation
1. Serve a model with vLLM
vllm serve --config configs/vllm/Qwen3.5-9B.yaml
2. Run evaluation
a3-eval --benchmark webarena_test --model A3-qwen3.5-9b
Pipeline: Generating A3-Synth
The A3 pipeline generates synthetic training data in 5 steps:
Step 1: Create personas
python scripts/create_personas.py
Step 2: Generate task intents (via exploration)
a3-explore
python scripts/generate_task_intents.py
Step 3: Create A3-Synth task configs
python scripts/create_synth_configs.py
Step 4: Collect trajectories
a3-synth --benchmark a3_synth --model gemini-3-pro
Step 5: Convert to training data
python scripts/convert_trajectories_to_json.py
python scripts/generate_rft_data.py
Training
a3-train --config configs/train/qwen3.5-9b.json
Training uses SFT with FSDP for multi-GPU parallelism. See configs/train/ for hyperparameters and configs/accelerate/ for FSDP configuration.
CLI Commands
| Command | Description |
|---|---|
a3-eval |
Run evaluation on WebArena, VisualWebArena, WorkArena, MiniWoB |
a3-synth |
Run trajectory collection for A3-Synth |
a3-explore |
Run environment exploration |
a3-train |
Fine-tune a model with SFT |
a3-screen-utils |
Screen session management utilities |
Project Structure
agent-as-annotators/
agent_as_annotators/ # Core package
cli/ # CLI entry points (eval, synth, explore, train)
modeling.py # Agent model wrapper (vLLM, Gemini, OpenAI)
prompts/ # All prompt templates
judge/ # Inverted evaluation protocol (Judge module)
benchmarks/a3_synth/ # A3-Synth benchmark registration
exploration/ # Exploration task registration
utils/ # Utilities
configs/a3_synth/ # A3-Synth task configurations
configs/
model_configs.json # Model registry
train/ # Training hyperparameters
vllm/ # vLLM serving configs
accelerate/ # FSDP configs
scripts/ # Data pipeline scripts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_as_annotators-0.1.0.tar.gz.
File metadata
- Download URL: agent_as_annotators-0.1.0.tar.gz
- Upload date:
- Size: 72.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dc96976826a2f4f77efa9215846592d0e85828b35e0dc6fb099f7d1316cda5f
|
|
| MD5 |
063de4070a28ab473647f7a68b798f26
|
|
| BLAKE2b-256 |
c9adc4d75e07dc02944d763981d20c66c94a349b840bda0a5fbc71304346fcc7
|
File details
Details for the file agent_as_annotators-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agent_as_annotators-0.1.0-py3-none-any.whl
- Upload date:
- Size: 83.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebaf6fe8a7805b332109aa17b51faffefe9a50257ffd15ad758300c13002d6ff
|
|
| MD5 |
71fb46d70bec334521e009b0138645c0
|
|
| BLAKE2b-256 |
ce45b56b98b2dd096007f8d77f7d82b397a7573ab9f4b7d4f7c8d6fb8b060668
|