Add your description here
Project description
Control Runner
Run control_arena experiments from YAML. Define settings, protocols, policies, monitors, and scorers as reusable components or inline specs. Then execute them from a single CLI.
Why use it
- Simple YAML: Reference reusable components or define them inline
- Dynamic variables:
${var}substitution and file-content expansion - Safe dry-runs: Validate configs without importing heavy deps
- Batteries included: Rich built‑in registry mapped to control_arena
Quick start (PyPI)
Prereqs: Python >= 3.13, uv installed.
# 1) Add to your project
uv add control-runner
# 2) Run with uv
uv run control_runner run path/to/config.yaml --dry-run # validate
uv run control_runner run path/to/config.yaml # execute
# Optional: include a custom module
uv run control_runner run path/to/config.yaml custom_ext.py --dry-run
# Alternative: one-off run without adding to your project
uvx control-runner run path/to/config.yaml --dry-run
CLI
control_runner run CONFIG [CUSTOM_PY] [--dry-run]
CONFIG(required): Path to a YAML fileCUSTOM_PY(optional): Path to a Python module for extensions (see below)--dry-run: Load and print tasks without running evaluation
Behavior:
- Sets
UNTRUSTED_MODELandTRUSTED_MODELenv vars if present invars - Groups tasks by their
limitand runs each group viainspect_ai.eval(tasks=..., limit=...)
YAML configuration
Top-level keys:
vars(optional): Simple key/value variablessettings: Named settings objectscomponents(optional): Reusable components (policies, monitors, protocols, scorers, …)tasks: List of tasks to run
Referencing values
You can reference values in two ways:
${name}: explicit referencename: bare string; if it exactly matches a key invars,settings, orcomponents, it resolves to that value
String substitutions are supported for primitives inside larger strings, e.g. "Model: ${untrusted_model}".
Variables and file loading
Within vars, any string that looks like a path (./, ../, or / and has an extension) is read as file content. This makes prompts easy to manage:
vars:
untrusted_model: google/gemini-2.5-flash
trusted_model: google/gemini-2.5-flash-lite
attack_monitor_prompt: ./prompts/attack_monitor_simple.txt # contents will be in this var
Use these variables anywhere (including inside inline specs) via ${...}.
Inline specs and the registry
Components can be defined inline using either type (lookup in the registry) or import (direct Python import path):
policy_builder:
type: untrusted_policy
params:
honest: ${honest_policy_builder}
attack: ${attack_policy_builder}
protocol:
import: control_arena.protocol.dtt_protocol:dtt_protocol
params:
untrusted_policy_builder: ${policy_builder}
monitor_builder: ${monitor_builder}
The registry contains keys for settings, policies, monitors, protocols, scorers, and tools. See src/control_runner/registry.py for the full list of supported names.
Tasks
tasks:
- id: attack_task
setting: ${apps_setting} # reference or inline spec
protocol: ${monitored_protocol} # reference or inline spec
mode: ATTACK # string or EvalMode name (case‑insensitive)
monitoring_scorers: # list of names or inline spec(s)
- max_suspicion_scorer
limit: 1 # per‑group limit for inspect_ai.eval
monitoring_scorers accepts either a list or a single inline spec object.
Settings and components
Reusable blocks defined under settings and components can be referenced from tasks or other components:
settings:
apps_setting:
type: apps
params:
dataset_name: simple
components:
monitored_protocol:
type: tm_protocol
params:
policy_builder: ${untrusted_policy_builder}
monitor_builder: ${untrusted_monitor_builder}
Using a custom Python module
Pass a path to a Python file as the second CLI argument. If it defines register(resolver), it will be called at load time so you can extend behavior (e.g., pre‑register objects, mutate vars):
# custom_ext.py
def register(resolver):
# Example: inject a computed variable
resolver.vars["run_id"] = "local-dev"
Run with:
uv run python -m control_runner.cli run example_configs/components.yaml custom_ext.py --dry-run
Examples
The example_configs directory shows common patterns:
components.yaml: Reusable components and reference-based tasksinline.yaml: Fully inline protocol and nested inline componentsinline_with_components.yaml: Mix of inline protocol with referenced componentsinline_strings.yaml: Minimal config using bare string references
Run any example with --dry-run to validate it, then without to execute.
Troubleshooting
- Unknown reference: Ensure the name exists in
vars,settings, orcomponents - Unknown type: Check that the
typeexists in the registry (src/control_runner/registry.py) - File var not loaded: File expansion only applies to values in
vars
Development
control_runner/
├─ example_configs/
├─ prompts/
├─ src/control_runner/
│ ├─ cli.py # Click CLI (recommended entrypoint)
│ ├─ loader.py # YAML loading and resolution
│ └─ registry.py # control_arena registry mapping
├─ tests/
└─ main.py # Argparse legacy entrypoint
From source (repo checkout):
uv sync
uv run control_runner run example_configs/components.yaml --dry-run
Testing
uv run pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file control_runner-0.1.0.tar.gz.
File metadata
- Download URL: control_runner-0.1.0.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
830a52499c277af468496305860e10dd5d934399cf0be26bc57228bb6685b8b7
|
|
| MD5 |
3ba667d042b1e6eb6f972324d664646a
|
|
| BLAKE2b-256 |
c354f5ef51df15af5b10e7d582e16a6da0fe809eb4a5e1b02ce4188270f19403
|
File details
Details for the file control_runner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: control_runner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52d6aa4251bfd5740bb985ada62445d7b872182c5988680901038d20f14f85e7
|
|
| MD5 |
ddba4718dcd98affb8e799d221043957
|
|
| BLAKE2b-256 |
41abba47fad4f6e099ea7871b2ce3232803648a2f3453f00cb23472b76113f5d
|