Inspect Flow is a workflow stack built on Inspect AI that enables research organizations to run AI evaluations at scale
Project description
Inspect Flow
Workflow orchestration for Inspect AI that enables you to run evaluations at scale with repeatability and maintainability.
Why Inspect Flow?
As evaluation workflows grow in complexity—running multiple tasks across different models with varying parameters—managing these experiments becomes challenging. Inspect Flow addresses this by providing:
- Declarative Configuration: Define complex evaluations with tasks, models, and parameters in type-safe schemas
- Repeatable & Shareable: Encapsulated definitions of tasks, models, configurations, and Python dependencies ensure experiments can be reliably repeated and shared
- Powerful Defaults: Define defaults once and reuse them everywhere with automatic inheritance
- Parameter Sweeping: Matrix patterns for systematic exploration across tasks, models, and hyperparameters
Inspect Flow is designed for researchers and engineers running systematic AI evaluations who need to scale beyond ad-hoc scripts.
Getting Started
Prerequisites
Before using Inspect Flow, you should:
- Have familiarity with Inspect AI
- Have an existing Inspect evaluation or use one from inspect-evals
Installation
pip install inspect-flow
Optional: VS Code extension
Optionally install the Inspect AI VS Code Extension which includes features for viewing evaluation log files.
Basic Example
FlowSpec is the main entrypoint for defining evaluation runs. At its core, it takes a list of tasks to run. Here's a simple example that runs two evaluations:
from inspect_flow import FlowSpec, FlowTask
FlowSpec(
log_dir="logs",
tasks=[
FlowTask(
name="inspect_evals/gpqa_diamond",
model="openai/gpt-4o",
),
FlowTask(
name="inspect_evals/mmlu_0_shot",
model="openai/gpt-4o",
),
],
)
To run the evaluations, run the following command in your shell. This will create a virtual environment for this spec run and install the dependencies. Note that task and model dependencies (like the inspect-evals and openai Python packages) are inferred and installed automatically.
flow run config.py
This will run both tasks and display progress in your terminal.
Python API
You can run evaluations from Python instead of the command line.
from inspect_flow import FlowSpec, FlowTask
from inspect_flow.api import run
spec = FlowSpec(
log_dir="logs",
tasks=[
FlowTask(
name="inspect_evals/gpqa_diamond",
model="openai/gpt-4o",
),
FlowTask(
name="inspect_evals/mmlu_0_shot",
model="openai/gpt-4o",
),
],
)
run(spec=spec)
Matrix Functions
Often you'll want to evaluate multiple tasks across multiple models. Rather than manually defining every combination, use tasks_matrix to generate all task-model pairs:
from inspect_flow import FlowSpec, tasks_matrix
FlowSpec(
log_dir="logs",
tasks=tasks_matrix(
task=[
"inspect_evals/gpqa_diamond",
"inspect_evals/mmlu_0_shot",
],
model=[
"openai/gpt-5",
"openai/gpt-5-mini",
],
),
)
To preview the expanded config before running it, you can run the following command in your shell to ensure the generated config is the one that you intend to run.
flow config matrix.py
This command outputs the expanded configuration showing all 4 task-model combinations (2 tasks × 2 models).
log_dir: logs
dependencies:
- inspect-evals
tasks:
- name: inspect_evals/gpqa_diamond
model:
name: openai/gpt-5
- name: inspect_evals/gpqa_diamond
model:
name: openai/gpt-5-mini
- name: inspect_evals/mmlu_0_shot
model:
name: openai/gpt-5
- name: inspect_evals/mmlu_0_shot
model:
name: openai/gpt-5-mini
tasks_matrix and models_matrix are powerful functions that can operate on multiple levels of nested matrixes which enable sophisticated parameter sweeping. Let's say you want to explore different reasoning efforts across models—you can achieve this with the models_matrix function.
from inspect_ai.model import GenerateConfig
from inspect_flow import FlowSpec, models_matrix, tasks_matrix
FlowSpec(
log_dir="logs",
tasks=tasks_matrix(
task=[
"inspect_evals/gpqa_diamond",
"inspect_evals/mmlu_0_shot",
],
model=models_matrix(
model=[
"openai/gpt-5",
"openai/gpt-5-mini",
],
config=[
GenerateConfig(reasoning_effort="minimal"),
GenerateConfig(reasoning_effort="low"),
GenerateConfig(reasoning_effort="medium"),
GenerateConfig(reasoning_effort="high"),
],
),
),
)
For even more concise parameter sweeping, use configs_matrix to generate configuration variants. This produces the same 16 evaluations (2 tasks × 2 models × 4 reasoning levels) as above, but with less boilerplate:
from inspect_flow import FlowSpec, configs_matrix, models_matrix, tasks_matrix
FlowSpec(
log_dir="logs",
tasks=tasks_matrix(
task=[
"inspect_evals/gpqa_diamond",
"inspect_evals/mmlu_0_shot",
],
model=models_matrix(
model=[
"openai/gpt-5",
"openai/gpt-5-mini",
],
config=configs_matrix(
reasoning_effort=["minimal", "low", "medium", "high"],
),
),
),
)
Run evaluations
Before running evaluations, preview the resolved configuration with --dry-run:
flow run matrix.py --dry-run
This creates the virtual environment, installs all dependencies, imports tasks from the registry, applies all defaults, and expands all matrix functions—everything except actually running the evaluations. It's invaluable for verifying that dependencies can be installed, tasks are properly configured, and the exact settings are what you expect. Unlike flow config which just parses the config file, --dry-run performs the full setup process.
To run the config:
flow run matrix.py
This will run all 16 evaluations (2 tasks × 2 models × 4 reasoning levels). When complete, you'll find a link to the logs at the bottom of the task results summary.
To view logs interactively, run:
inspect view --log-dir logs
Learning More
See the following articles to learn more about using Flow:
- Flow Concepts: Flow type system, config structure and basics.
- Defaults: Define defaults once and reuse them everywhere with automatic inheritance.
- Matrixing: Systematic parameter exploration with matrix and with functions.
- Reference: Detailed documentation on the Flow Python API and CLI commands.
Development
To work on development of Inspect Flow, clone the repository and install with the -e flag and [dev, doc] optional dependencies:
git clone https://github.com/meridianlabs-ai/inspect_flow
cd inspect_flow
uv sync
source .venv/bin/activate
Optionally install pre-commit hooks via
make hooks
Run linting, formatting, and tests via
make check
make test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inspect_flow-0.7.0.tar.gz.
File metadata
- Download URL: inspect_flow-0.7.0.tar.gz
- Upload date:
- Size: 85.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efbbf288e5c45003c85c4545cac2c3ef1c7471446bc9af0167f213da49f17d07
|
|
| MD5 |
aa3ed9007026f8b5de28aa5bf4526818
|
|
| BLAKE2b-256 |
42c95c581554817d9dff3f98c33c6ed362d3161da859371a9a83af2526f1fb0e
|
Provenance
The following attestation bundles were made for inspect_flow-0.7.0.tar.gz:
Publisher:
release.yaml on meridianlabs-ai/inspect_flow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inspect_flow-0.7.0.tar.gz -
Subject digest:
efbbf288e5c45003c85c4545cac2c3ef1c7471446bc9af0167f213da49f17d07 - Sigstore transparency entry: 1172638443
- Sigstore integration time:
-
Permalink:
meridianlabs-ai/inspect_flow@dc529400042eba64c064d04d781e8d488bf5b35e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/meridianlabs-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@dc529400042eba64c064d04d781e8d488bf5b35e -
Trigger Event:
push
-
Statement type:
File details
Details for the file inspect_flow-0.7.0-py3-none-any.whl.
File metadata
- Download URL: inspect_flow-0.7.0-py3-none-any.whl
- Upload date:
- Size: 110.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2f511208d505779c8906886145575c62d3e1785223f10775b54deedf9426cf8
|
|
| MD5 |
44eed6e9ba6b9024f10e4d661e376212
|
|
| BLAKE2b-256 |
15c57c150cf1a02cd191c9de2b44182742ab5511b398e52b5eace8e879ea0dd9
|
Provenance
The following attestation bundles were made for inspect_flow-0.7.0-py3-none-any.whl:
Publisher:
release.yaml on meridianlabs-ai/inspect_flow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inspect_flow-0.7.0-py3-none-any.whl -
Subject digest:
b2f511208d505779c8906886145575c62d3e1785223f10775b54deedf9426cf8 - Sigstore transparency entry: 1172638489
- Sigstore integration time:
-
Permalink:
meridianlabs-ai/inspect_flow@dc529400042eba64c064d04d781e8d488bf5b35e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/meridianlabs-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@dc529400042eba64c064d04d781e8d488bf5b35e -
Trigger Event:
push
-
Statement type: