Tool for configuring and running control_arena experiments using .yaml files.
Project description
Control Runner
Run control_arena experiments from YAML. Define settings, protocols, policies, monitors, and scorers as reusable components or inline specs. Then execute them from a single CLI.
Environment
Most model providers require API keys via environment variables. It’s recommended to keep them in a .env file and ensure your environment loads it.
# OpenAI
OPENAI_API_KEY=sk-...replace-me...
# Google
GOOGLE_API_KEY=...replace-me...
Quick start
- Install
uv add control-runner
- Create
quickstart.yaml
settings:
apps_setting:
type: "apps"
params:
dataset_name: "simple"
tasks:
- id: "quickstart"
setting: "apps_setting"
protocol:
type: "tm_protocol"
params:
policy_builder:
type: "untrusted_policy"
params:
honest:
type: "honest_policy"
params:
model: "google/gemini-2.5-flash-lite"
attack:
type: "attack_policy"
params:
model: "google/gemini-2.5-flash"
monitor_builder:
type: "basic_monitor"
params:
model: "google/gemini-2.5-flash-lite"
mode: "ATTACK"
monitoring_scorers:
- "max_suspicion_scorer"
limit: 5
- Run
uv run control_runner run quickstart.yaml --dry-run # validate
uv run control_runner run quickstart.yaml # execute
Alternatively, without adding to your project:
uvx control-runner run quickstart.yaml --dry-run
Why use it
- Simple YAML: Reference reusable components or define them inline
- Dynamic variables:
${var}substitution and file-content expansion - Safe dry-runs: Validate configs without importing heavy deps
- Batteries included: Rich built‑in registry mapped to control_arena
Quick start (PyPI)
Prereqs: Python >= 3.13, uv installed.
# 1) Add to your project
uv add control-runner
# 2) Run with uv
uv run control_runner run path/to/config.yaml --dry-run # validate
uv run control_runner run path/to/config.yaml # execute
# Alternative: one-off run without adding to your project
uvx control-runner run path/to/config.yaml --dry-run
CLI
control_runner run CONFIG [--dry-run]
CONFIG(required): Path to a YAML file--dry-run: Load and print tasks without running evaluation
Behavior:
- Sets
UNTRUSTED_MODELandTRUSTED_MODELenv vars if present invars - Groups tasks by their
limitand runs each group viainspect_ai.eval(tasks=..., limit=...)
YAML configuration
Top-level keys:
vars(optional): Simple key/value variablessettings: Named settings objectscomponents(optional): Reusable components (policies, monitors, protocols, scorers, …)tasks: List of tasks to run
Referencing values
You can reference values in two ways:
${name}: explicit referencename: bare string; if it exactly matches a key invars,settings, orcomponents, it resolves to that value
String substitutions are supported for primitives inside larger strings, e.g. "Model: ${untrusted_model}".
Variables and file loading
Within vars, any string that looks like a path (./, ../, or / and has an extension) is read as file content. This makes prompts easy to manage:
vars:
untrusted_model: google/gemini-2.5-flash
trusted_model: google/gemini-2.5-flash-lite
attack_monitor_prompt: ./prompts/attack_monitor_simple.txt # contents will be in this var
Use these variables anywhere (including inside inline specs) via ${...}.
Inline specs and the registry
Components can be defined inline using type (lookup in the registry):
policy_builder:
type: untrusted_policy
params:
honest: ${honest_policy_builder}
attack: ${attack_policy_builder}
The registry contains keys for settings, policies, monitors, protocols, scorers, and tools. See src/control_runner/registry.py for the full list of supported names.
Tasks
tasks:
- id: attack_task
setting: ${apps_setting} # reference or inline spec
protocol: ${monitored_protocol} # reference or inline spec
mode: ATTACK # string or EvalMode name (case‑insensitive)
monitoring_scorers: # list of names or inline spec(s)
- max_suspicion_scorer
limit: 1 # per‑group limit for inspect_ai.eval
monitoring_scorers accepts either a list or a single inline spec object.
Settings and components
Reusable blocks defined under settings and components can be referenced from tasks or other components:
settings:
apps_setting:
type: apps
params:
dataset_name: simple
components:
monitored_protocol:
type: tm_protocol
params:
policy_builder: ${untrusted_policy_builder}
monitor_builder: ${untrusted_monitor_builder}
Examples
The example_configs directory shows common patterns:
components.yaml: Reusable components and reference-based tasksinline.yaml: Fully inline protocol and nested inline componentsinline_with_components.yaml: Mix of inline protocol with referenced componentsinline_strings.yaml: Minimal config using bare string references
Run any example with --dry-run to validate it, then without to execute.
Troubleshooting
- Unknown reference: Ensure the name exists in
vars,settings, orcomponents - Unknown type: Check that the
typeexists in the registry (src/control_runner/registry.py) - File var not loaded: File expansion only applies to values in
vars
Development
control_runner/
├─ example_configs/
├─ prompts/
├─ src/control_runner/
│ ├─ cli.py # Click CLI entrypoint
│ ├─ loader.py # YAML loading and resolution
│ └─ registry.py # control_arena registry mapping
├─ tests/
└─ README.md
From source (repo checkout):
uv sync
uv run control_runner run example_configs/components.yaml --dry-run
Testing
uv run pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file control_runner-0.1.1.tar.gz.
File metadata
- Download URL: control_runner-0.1.1.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3eddb24eb8e2c694bf8021fb2bc1a7f562c87f244132e6e3f81e082a85d15c2e
|
|
| MD5 |
eb323df71db409083e81cc8a485e5bc3
|
|
| BLAKE2b-256 |
0418fc38c977334233903c4dc1dad0f2bd1b13be955858ca2baf06dde51bfa30
|
File details
Details for the file control_runner-0.1.1-py3-none-any.whl.
File metadata
- Download URL: control_runner-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf665b66d88be9d8284de0b3cbda7572c7745d7bc4f3d9ca27b8d5eabac184e2
|
|
| MD5 |
87ce896e4f0b38eaa0188b5c9d99c289
|
|
| BLAKE2b-256 |
b91a306441e8546b43bf499228763cc905c9af154da24d7ffe91c276f05195c3
|