control-runner

Tool for configuring and running control_arena experiments using .yaml files.

Project description

Control Runner

Run control_arena experiments from YAML. Define settings, protocols, policies, monitors, and scorers as reusable components or inline specs. Then execute them from a single CLI.

Environment

Most model providers require API keys via environment variables. It’s recommended to keep them in a .env file and ensure your environment loads it.

# OpenAI
OPENAI_API_KEY=sk-...replace-me...

# Google
GOOGLE_API_KEY=...replace-me...

Quick start

Install

uv add control-runner

Create quickstart.yaml

settings:
  apps_setting:
    type: "apps"
    params:
      dataset_name: "simple"

tasks:
  - id: "quickstart"
    setting: "apps_setting"
    protocol:
      type: "tm_protocol"
      params:
        policy_builder:
          type: "untrusted_policy"
          params:
            honest:
              type: "honest_policy"
              params:
                model: "google/gemini-2.5-flash-lite"
            attack:
              type: "attack_policy"
              params:
                model: "google/gemini-2.5-flash"
        monitor_builder:
          type: "basic_monitor"
          params:
            model: "google/gemini-2.5-flash-lite"
    mode: "ATTACK"
    monitoring_scorers:
      - "max_suspicion_scorer"
    limit: 5

uv run control_runner run quickstart.yaml --dry-run  # validate
uv run control_runner run quickstart.yaml            # execute

Alternatively, without adding to your project:

uvx control-runner run quickstart.yaml --dry-run

Why use it

Simple YAML: Reference reusable components or define them inline
Dynamic variables: ${var} substitution and file-content expansion
Safe dry-runs: Validate configs without importing heavy deps
Batteries included: Rich built‑in registry mapped to control_arena

Quick start (PyPI)

Prereqs: Python >= 3.13, uv installed.

# 1) Add to your project
uv add control-runner

# 2) Run with uv
uv run control_runner run path/to/config.yaml --dry-run  # validate
uv run control_runner run path/to/config.yaml            # execute

# Alternative: one-off run without adding to your project
uvx control-runner run path/to/config.yaml --dry-run

CLI

control_runner run CONFIG [--dry-run]

CONFIG (required): Path to a YAML file
--dry-run: Load and print tasks without running evaluation

Behavior:

Sets UNTRUSTED_MODEL and TRUSTED_MODEL env vars if present in vars
Groups tasks by their limit and runs each group via inspect_ai.eval(tasks=..., limit=...)

YAML configuration

Top-level keys:

vars (optional): Simple key/value variables
settings: Named settings objects
components (optional): Reusable components (policies, monitors, protocols, scorers, …)
tasks: List of tasks to run

Referencing values

You can reference values in two ways:

${name}: explicit reference
name: bare string; if it exactly matches a key in vars, settings, or components, it resolves to that value

String substitutions are supported for primitives inside larger strings, e.g. "Model: ${untrusted_model}".

Variables and file loading

Within vars, any string that looks like a path (./, ../, or / and has an extension) is read as file content. This makes prompts easy to manage:

vars:
  untrusted_model: google/gemini-2.5-flash
  trusted_model: google/gemini-2.5-flash-lite
  attack_monitor_prompt: ./prompts/attack_monitor_simple.txt  # contents will be in this var

Use these variables anywhere (including inside inline specs) via ${...}.

Inline specs and the registry

Components can be defined inline using type (lookup in the registry):

policy_builder:
  type: untrusted_policy
  params:
    honest: ${honest_policy_builder}
    attack: ${attack_policy_builder}

The registry contains keys for settings, policies, monitors, protocols, scorers, and tools. See src/control_runner/registry.py for the full list of supported names.

Tasks

tasks:
  - id: attack_task
    setting: ${apps_setting}          # reference or inline spec
    protocol: ${monitored_protocol}   # reference or inline spec
    mode: ATTACK                      # string or EvalMode name (case‑insensitive)
    monitoring_scorers:               # list of names or inline spec(s)
      - max_suspicion_scorer
    limit: 1                          # per‑group limit for inspect_ai.eval

monitoring_scorers accepts either a list or a single inline spec object.

Settings and components

Reusable blocks defined under settings and components can be referenced from tasks or other components:

settings:
  apps_setting:
    type: apps
    params:
      dataset_name: simple

components:
  monitored_protocol:
    type: tm_protocol
    params:
      policy_builder: ${untrusted_policy_builder}
      monitor_builder: ${untrusted_monitor_builder}

Examples

The example_configs directory shows common patterns:

components.yaml: Reusable components and reference-based tasks
inline.yaml: Fully inline protocol and nested inline components
inline_with_components.yaml: Mix of inline protocol with referenced components
inline_strings.yaml: Minimal config using bare string references

Run any example with --dry-run to validate it, then without to execute.

Troubleshooting

Unknown reference: Ensure the name exists in vars, settings, or components
Unknown type: Check that the type exists in the registry (src/control_runner/registry.py)
File var not loaded: File expansion only applies to values in vars

Development

control_runner/
├─ example_configs/
├─ prompts/
├─ src/control_runner/
│  ├─ cli.py         # Click CLI entrypoint
│  ├─ loader.py      # YAML loading and resolution
│  └─ registry.py    # control_arena registry mapping
├─ tests/
└─ README.md

From source (repo checkout):

uv sync
uv run control_runner run example_configs/components.yaml --dry-run

Testing

uv run pytest -q

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Aug 12, 2025

0.1.0

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

control_runner-0.1.1.tar.gz (13.2 kB view details)

Uploaded Aug 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

control_runner-0.1.1-py3-none-any.whl (11.5 kB view details)

Uploaded Aug 12, 2025 Python 3

File details

Details for the file control_runner-0.1.1.tar.gz.

File metadata

Download URL: control_runner-0.1.1.tar.gz
Upload date: Aug 12, 2025
Size: 13.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for control_runner-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3eddb24eb8e2c694bf8021fb2bc1a7f562c87f244132e6e3f81e082a85d15c2e`
MD5	`eb323df71db409083e81cc8a485e5bc3`
BLAKE2b-256	`0418fc38c977334233903c4dc1dad0f2bd1b13be955858ca2baf06dde51bfa30`

See more details on using hashes here.

File details

Details for the file control_runner-0.1.1-py3-none-any.whl.

File metadata

Download URL: control_runner-0.1.1-py3-none-any.whl
Upload date: Aug 12, 2025
Size: 11.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for control_runner-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf665b66d88be9d8284de0b3cbda7572c7745d7bc4f3d9ca27b8d5eabac184e2`
MD5	`87ce896e4f0b38eaa0188b5c9d99c289`
BLAKE2b-256	`b91a306441e8546b43bf499228763cc905c9af154da24d7ffe91c276f05195c3`

See more details on using hashes here.

control-runner 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Control Runner

Environment

Quick start

Why use it

Quick start (PyPI)

CLI

YAML configuration

Referencing values

Variables and file loading

Inline specs and the registry

Tasks

Settings and components

Examples

Troubleshooting

Development

Testing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes