OSS Python SDK for agent simulation: define a multi-turn agent test once, run it as a pytest CI gate, export it as a verifiers/OpenEnv RL environment.

These details have not been verified by PyPI

Project links

Project description

korrel

OSS Python SDK for agent simulation. Define a multi-turn agent test once, run it as a pytest CI gate, export it as a verifiers/OpenEnv RL environment.

Install

pip install korrel

Inside a uv project, add it as a dependency instead:

uv add korrel

Quickstart

The first run needs no provider key and spends nothing. You write a scenario, run it against a scripted stand-in agent, and watch the gate pass or fail offline in well under a second. Step two swaps in a real agent for a live run.

1. Write a scenario (refund_offline.py):

from korrel import Persona, Rubric, Scenario
from korrel.types import Message

# The agent under test, scripted for the offline gate: a plain callable, no
# provider and no key. It returns a fixed assistant reply so you can see the
# gate run with zero spend. Go live (below) replaces this with a real agent.
def scripted_agent(messages, tools):
    return Message(role="assistant", content="Your refund of $49.99 is approved.")

def mentions_refund(completion, info, **kwargs):
    amount = f"{info['amount']:.2f}"
    return 1.0 if any(
        m.role == "assistant" and m.content
        and "refund" in m.content.lower() and amount in m.content
        for m in completion
    ) else 0.0

scenario = Scenario(
    id="refund_offline",
    system="You are a support agent. Approve the refund for order A1001.",
    # Persona is required by the type. At max_turns=1 the simulated user is
    # never called, so this offline run makes no model call and needs no key.
    persona=Persona(goal="Get a refund for order A1001."),
    opening_message="My order A1001 arrived broken and I want a refund.",
    max_turns=1,
    info={"amount": 49.99},
    rubric=Rubric(funcs=[mentions_refund], pass_threshold=0.5),
)

adapter = scripted_agent

2. Run it:

korrel run refund_offline.py

Output on pass:

scenario    : refund_offline
model       : unknown
score       : 1.0000
status      : pass
model calls : 1
stop reason : max_turns
transcript  : .korrel/refund_offline.transcript.json

The CLI exits zero on pass and non-zero on failure. No provider key is set and nothing is billed: the model calls line counts loop invocations, and that one call is the local scripted adapter, which makes no provider request. The model line reads unknown because a scripted adapter exposes no provider; a provider-backed adapter prints its model name. The stop reason : max_turns line marks that the turn budget ended the run, which at max_turns=1 is every run; it is absent when the simulated user ends the conversation first. The full conversation transcript is written to .korrel/<scenario-id>.transcript.json. The same file runs as a CI gate under pytest; see The pytest CI gate.

Go live

Swap the scripted stand-in for a real agent and let the simulated user drive a multi-turn conversation. This run makes live model calls billed to your own provider key.

Bring your own provider keys. Korrel reads keys from the environment at call time and stores none. The default provider is Claude via the anthropic SDK; set ANTHROPIC_API_KEY. The default model is claude-sonnet-4-6 for both the agent provider and the Persona; override it with AnthropicProvider(model=...) and Persona(..., model=...). OpenAI support is an optional extra (korrel[openai]).

1. Set a key:

export ANTHROPIC_API_KEY=...

2. Write a scenario with a real agent and a mock tool (support_refund.py):

from korrel import MockTool, Persona, Rubric, Scenario, adapter_from_provider
from korrel.providers import AnthropicProvider

def lookup_order(arguments, state):
    state["called"] = True
    orders = {"A1001": {"order_id": "A1001", "amount": 49.99, "refundable": True}}
    return orders.get(arguments.get("order_id", ""), {"error": "not found"})

orders = MockTool(
    name="lookup_order",
    schema={"type": "function", "function": {
        "name": "lookup_order",
        "description": "Look up an order by its id.",
        "parameters": {"type": "object",
                       "properties": {"order_id": {"type": "string"}},
                       "required": ["order_id"]},
    }},
    respond=lookup_order,
)

def confirmed(completion, info, **kwargs):
    amount = f"{info['amount']:.2f}"
    return 1.0 if any(
        m.role == "assistant" and m.content
        and "refund" in m.content.lower() and amount in m.content
        for m in completion
    ) else 0.0

scenario = Scenario(
    id="support_refund",
    system="You are a support agent. Resolve refunds using lookup_order.",
    persona=Persona(goal="Get a refund for order A1001.", behavior="Polite but firm."),
    opening_message="My order A1001 arrived broken and I want a refund.",
    tools=[orders],
    max_turns=3,
    seed=7,
    info={"order_id": "A1001", "amount": 49.99},
    rubric=Rubric(funcs=[confirmed], pass_threshold=0.5),
)

# AnthropicProvider reads ANTHROPIC_API_KEY at call time, not at import time.
adapter = adapter_from_provider(AnthropicProvider())

3. Run it:

korrel run support_refund.py

Output on pass:

scenario    : support_refund
model       : claude-sonnet-4-6
score       : 1.0000
status      : pass
model calls : 6
stop reason : max_turns
transcript  : .korrel/support_refund.transcript.json

Output on failure (exit code 1):

scenario    : support_refund
model       : claude-sonnet-4-6
score       : 0.0000
status      : fail
model calls : 6
stop reason : max_turns
failed      : confirmed
clusters    : confirmed(zero)
transcript  : .korrel/support_refund.transcript.json

CLI flags:

korrel run SCENARIO_PY [--out DIR] [--seed N] [--model NAME]
                       [--scenario-attr NAME] [--adapter-attr NAME]

--out overrides the transcript directory (default .korrel/). --seed overrides the scenario seed. --model overrides the persona model always, and the agent model when the adapter exposes a provider (as adapter_from_provider does); a scripted adapter reports model : unknown. --scenario-attr and --adapter-attr override the module attribute names (defaults: scenario, adapter).

A run where any turn hits max_tool_rounds prints a tool rounds : capped line in the summary.

Run failures surface as a single error: line on stderr with exit code 1, never a raw traceback: a missing provider key prints the instruction to set it, and a mock tool that raises prints error: tool '<name>' raised: <message> and still writes the partial transcript up to the failing call.

Cost

Korrel is bring-your-own-keys. Every model call a run makes is billed to your own provider account. Korrel stores no key and bills nothing; the key is read from the environment at call time and stored nowhere. The cost of a run is the number of model calls it makes, so the figures below are stated in model calls, not dollars (the price per call depends on your provider and tier).

A run of one scenario makes these calls:

One call to the agent under test at the start of each turn.
One additional agent call for each tool-use round in a turn (running a mock tool is local Python, not a model call).
One call to the user-simulator (the Persona) per turn that continues. The opening message is a fixed string and makes no call, and the final turn makes no user-simulator call.
One call for an LLM judge, if the rubric has one, made once per run at scoring time. A plain reward function is local Python and makes no call.

As a rule of thumb, a scenario of T turns with no tool rounds and no judge is about 2T - 1 calls: T agent calls plus T - 1 user-simulator calls. Each tool round adds one agent call; a judge adds one call.

korrel run prints the measured number of calls the simulation loop made for the run, on the model calls line of the summary. That count covers the agent and user-simulator calls the loop issues. It does not yet include the judge's scoring-time call, which happens inside the rubric after the loop ends.

The pytest CI gate

Name scenario files *_scenario.py and place them in your test tree. The korrel pytest plugin (registered automatically via the pytest11 entry-point, no conftest needed) discovers and runs them:

uv run pytest

A passing scenario produces a green dot. A failing scenario produces a normal pytest failure block showing score, threshold, failed rubric function names, and the transcript path.

Discovery options:

--korrel-glob GLOB: override which file name pattern the plugin picks up (command-line flag)
korrel_glob = *_scenario.py: the corresponding pytest.ini / pyproject.toml option

Each scenario file in the CI gate must expose both a module-level scenario and a module-level adapter. If adapter is absent, the item is reported as an error immediately.

The four objects

Scenario: a code-first test definition. Holds the system prompt, a Persona, the opening message, mock tools, max_turns, max_tool_rounds, a seed, ground-truth info, and a Rubric.
Persona: the LLM-driven user-simulator. Given the conversation so far, it produces the next user message. Defaults to Claude. Scenario.persona is required even for single-turn scenarios; at max_turns=1 it is constructed but never invoked, so the offline quickstart makes no model call and needs no key.
MockTool: a programmable tool. Holds a chat-completions tool schema and a respond callable that takes parsed arguments and a mutable per-run state and returns a result.
Rubric: reward functions plus an optional hardened LLM judge. Reward signatures mirror verifiers: (completion, info, **kwargs) -> float. The judge treats the transcript as data, never as instructions.

The module convention

A scenario module exposes two module-level names:

scenario: a Scenario instance.
adapter: any callable (messages: list[Message], tools: list[ToolSchema]) -> Message. Both korrel run and the pytest plugin read these names (overridable via --scenario-attr / --adapter-attr).

The built-in helper adapter_from_provider(provider) wraps any Provider (such as AnthropicProvider) as an adapter. Because AnthropicProvider reads the API key from the environment only at call time, constructing adapter_from_provider(AnthropicProvider()) at module level is import-safe: no key is required to import the module.

Running the Python API directly

from korrel import run_scenario
result = run_scenario(scenario, adapter)
print(result.score, result.passed, result.failed_functions)

run_scenario(persona=...) overrides the scenario's persona for the run. The override is not limited to Persona instances: any object exposing next_message(messages) -> Optional[str] works, which is how tests inject a deterministic fake user-simulator with no model calls. The same override is accepted by to_verifiers_env(persona=...) (verifiers extra), build_environment_class(persona=...) (openenv extra), and the load_environment(persona=...) function of an exported verifiers package.

examples/support_refund.py holds a runnable scenario definition with a real AnthropicProvider adapter.

Export to verifiers

A Korrel scenario can be translated into a verifiers RL training environment. The full mapping is specified in docs/spec/korrel-to-verifiers.md.

Install

verifiers is an optional extra. import korrel works without it.

pip install 'korrel[verifiers]'
# or
uv add 'korrel[verifiers]'

verifiers==0.1.14 requires Python <3.14, so the extra installs only on Python 3.10-3.13.

Python API

from korrel.exporters.verifiers import to_verifiers_env

env = to_verifiers_env(scenario)

to_verifiers_env returns a constructed verifiers environment: a MultiTurnEnv subclass for most scenarios (persona-driven or tool-bearing), or a SingleTurnEnv for a single-exchange scenario with no tools and no persona follow-up. Using the support-refund scenario from the quickstart, the call returns a MultiTurnEnv because the scenario has both a persona and a mock tool.

CLI export

korrel export support_refund.py --to verifiers --out ./support_refund_env

This writes a pip-installable package that verifiers discovers via load_environment:

support_refund_env/
  pyproject.toml
  support_refund.py   # environment module exposing load_environment()
  _scenario.py        # copy of the original scenario source

Install the package in a Python 3.10-3.13 environment and load it:

import verifiers as vf

env = vf.load_environment("support_refund")

vf.load_environment resolves through the verifiers registry, which requires the full verifiers[all] install. The generated support_refund.py also exposes load_environment() directly, which loads under the base verifiers install:

from support_refund import load_environment

env = load_environment()

When --out is omitted, the package is written to .korrel/export/<scenario-id>/.

Concept mapping

Korrel concept	verifiers target
`Scenario` (persona or tools or `max_turns > 1`)	`MultiTurnEnv` subclass
`Scenario` (single exchange, no tools)	`SingleTurnEnv`
`Persona`	user-turn generation inside `env_response`
`MockTool`	tool-execution branch of `env_response`
Rubric reward functions	`verifiers.Rubric(funcs=..., weights=...)`
`Scenario.system` + `opening_message` + `info`	dataset row (`prompt`, `info`)

The spec holds the full field-level detail.

What survives the translation

The reward-function signature (completion, info, **kwargs) -> float is unchanged. The canonical transcript types in korrel.types are unchanged. The only adaptation at the boundary is converting the completion value from the verifiers message shape to the korrel canonical shape before calling each reward function.

Lossy edges (summarized; see the spec's Lossy edges section for the complete list):

The persona generates user turns with a live model call inside env_response. BYO key, non-deterministic. Offline tests substitute a fake persona.
The judge reward function likewise makes a live model call during scoring.
Korrel aggregates reward functions by mean; verifiers uses a weighted sum. The exporter sets weights to 1/n to reproduce the mean.
Korrel max_turns counts user-to-assistant exchanges; verifiers max_turns counts individual model-response steps. The exporter derives the verifiers step budget from scenario.max_turns and scenario.max_tool_rounds.

Export to OpenEnv

A Korrel scenario can be translated into an OpenEnv environment server. The full mapping is specified in docs/spec/korrel-to-openenv.md.

Install

openenv-core is an optional extra. import korrel works without it.

pip install 'korrel[openenv]'
# or
uv add 'korrel[openenv]'

openenv-core>=0.3.0 declares Requires-Python: >=3.10 with no upper bound, so it installs on Python 3.10 through 3.14, including the repo's default Python 3.14. This contrasts with the verifiers extra, which is capped at Python <3.14.

Python API

from korrel.exporters.openenv import build_environment_class

EnvironmentCls = build_environment_class(scenario, observation_cls, action_cls)

build_environment_class returns a concrete openenv.core.env_server.interfaces.Environment subclass. The caller supplies the author-defined Action and Observation subclasses (generated by the CLI export; see below). The persona keyword argument accepts an override for offline testing without live model calls.

The primary deployment path is the CLI export: the generated package is the artifact that gets deployed to a Hugging Face Space.

CLI export

korrel export support_refund.py --to openenv --out ./support_refund_env

This writes a pip-installable OpenEnv environment package:

support_refund_env/
    __init__.py
    client.py
    models.py
    openenv.yaml
    pyproject.toml
    README.md
    _scenario.py                    # copy of the original scenario source
    server/
        __init__.py
        support_refund_environment.py
        app.py
        Dockerfile
        requirements.txt

When --out is omitted, the package is written to .korrel/export/<scenario-id>/.

Deploy the generated package to a Hugging Face Space:

openenv push --secret ANTHROPIC_API_KEY=<your-key>

The persona and judge make live model calls inside the container. The API key is supplied at runtime via openenv push --secret and is never written into any emitted file.

Concept mapping

Korrel concept	OpenEnv target
`Scenario`	`Environment` subclass
`scenario.system` + `opening_message`	`reset()` seed observation
agent turn	author-defined `Action` subclass
environment reply	author-defined `Observation` subclass
`MockTool` execution	tool branch of `step()`
`Persona.next_message`	persona branch of `step()`
`Rubric` aggregate	terminal `observation.reward` (`done=True`)

The spec holds the full field-level detail.

What survives the translation

The reward-function signature (completion, info, **kwargs) -> float is unchanged. The canonical transcript types in korrel.types are unchanged. The OpenEnv Rubric class (incompatible action/observation signature) is not used; korrel computes reward with its own Rubric.score.

Lossy edges (summarized; see the spec's Lossy edges section for the complete list):

Reward is terminal, not dense. Every intermediate step carries reward=None; the final step (done=True) carries the rubric aggregate.
The persona and judge run server-side inside the container; they make live model calls, which are non-deterministic and require a key at runtime.
Content-shape narrowing: Observation.messages carries dicts; the full canonical Message type is used internally and serialized at the boundary.
Tool-call arguments stay a JSON string (ToolFunction.arguments), matching the chat-completions wire format.

Determinism

Every run takes a seed and records the model and request parameters. The seed pins scenario setup and any sampling Korrel controls. LLM calls are not bit-reproducible; provider nondeterminism is outside the seed.

Telemetry

Korrel includes opt-in telemetry. On opt-in, korrel run sends a single content-scrubbed run event to Korrel's collector. No scenario content, persona text, transcripts, prompts, tool schemas, file paths, model names, or keys are ever collected. The event carries only aggregate counters and version metadata.

What the run event sends (every field, nothing more):

Field	Description
`event`	Always `"run"`
`schema_version`	Event schema version (currently `"1"`)
`korrel_version`	Installed korrel version string
`python_version`	CPython version string
`scenario_count`	Number of scenarios in the run
`total_turns`	Total turns across all scenarios
`pass_count`	Number of passing scenarios
`fail_count`	Number of failing scenarios
`duration_s`	Wall-clock duration in seconds
`install_id`	Anonymous, randomly generated UUID (created once, stored locally)

No key, scenario id, path, persona, transcript, prompt, tool schema, or model name is present in the event.

Opt-outs (any one disables telemetry):

Set KORREL_TELEMETRY=0 (also accepts false, no, off) in the environment.
Set DO_NOT_TRACK=1 in the environment.
Telemetry is automatically off in CI (detected via CI, GITHUB_ACTIONS, TRAVIS, CIRCLECI, GITLAB_CI, JENKINS_URL, BUILDKITE, TF_BUILD, TEAMCITY_VERSION, BITBUCKET_BUILD_NUMBER).
On the first interactive run outside CI, Korrel prompts once for consent and persists the answer. Declining disables telemetry permanently for that install. Non-interactive sessions default to off with no prompt.

Where the event goes. Opted-in events are sent to Korrel's public collector. Set KORREL_TELEMETRY_ENDPOINT to redirect them to a self-hosted collector instead. Set KORREL_TELEMETRY_DEBUG=1 to write the event JSON to stderr for inspection instead of sending it. Sending is best-effort with a 3 second timeout; a failure never affects the run.

Consent and the anonymous install id are stored in %APPDATA%\korrel\config.json (Windows) or $XDG_CONFIG_HOME/korrel/config.json / ~/.config/korrel/config.json (Linux/macOS). No key, scenario content, or identifying information is ever written there.

Data model and chat-completions compatibility

The canonical transcript types in korrel.types are provider-neutral and wire-compatible with the OpenAI chat-completions message schema. They are the v0.2 verifiers/OpenEnv export target. Their attribute names and wire shapes are a contract.

Types

Message - one message in a conversation:

Message(
    role="assistant",        # "system" | "user" | "assistant" | "tool"
    content="Your refund...", # text body; None when only tool_calls is set
    tool_calls=[...],        # present on assistant messages that call tools
    tool_call_id="call_1",   # links a role="tool" message to the call it answers
    name=None,               # optional speaker name
)

ToolCall - a single tool invocation inside an assistant message:

ToolCall(
    id="call_1",
    type="function",          # always "function"
    function=ToolFunction(
        name="lookup_order",
        arguments='{"order_id": "A1001"}',  # JSON-encoded string, not a dict
    ),
)

arguments is a JSON-encoded string, matching the chat-completions wire format (OpenAI API reference, tool_calls[].function.arguments). Parse with json.loads() to recover the call arguments.

ToolSchema - a tool definition passed to an adapter, in chat-completions tool-schema shape:

{"type": "function", "function": {
    "name": "lookup_order",
    "description": "Look up an order by its id.",
    "parameters": {"type": "object",
                   "properties": {"order_id": {"type": "string"}},
                   "required": ["order_id"]},
}}

Chat-completions mapping table

Canonical field	Chat-completions wire field	Notes
`Message.role`	`role`	`"system"`, `"user"`, `"assistant"`, `"tool"`
`Message.content`	`content`	`None` when only tool calls are present
`Message.tool_calls`	`tool_calls`	array of `ToolCall` objects
`Message.tool_call_id`	`tool_call_id`	on `role="tool"` messages
`ToolCall.id`	`tool_calls[].id`
`ToolCall.type`	`tool_calls[].type`	always `"function"`
`ToolCall.function.name`	`tool_calls[].function.name`
`ToolCall.function.arguments`	`tool_calls[].function.arguments`	JSON string, not a dict

The Anthropic provider (AnthropicProvider in korrel.providers) translates between tool_use/tool_result blocks and these canonical types. Nothing in korrel.types depends on the openai package.

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Jun 11, 2026

0.1.2

Jun 10, 2026

0.1.1

Jun 10, 2026

0.1.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

korrel-0.1.3.tar.gz (1.4 MB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

korrel-0.1.3-py3-none-any.whl (56.9 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file korrel-0.1.3.tar.gz.

File metadata

Download URL: korrel-0.1.3.tar.gz
Upload date: Jun 11, 2026
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for korrel-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`15b09911331728cc82b9a333438a25e86ccb0b31003f36d45b9b4a76eeffd387`
MD5	`6f81665639a5349a20c6289756bf3f6e`
BLAKE2b-256	`e21d4daf98967f6167d2b62d3f2cd2f901d223fdd8997460e3396b6b7ac80ea9`

See more details on using hashes here.

File details

Details for the file korrel-0.1.3-py3-none-any.whl.

File metadata

Download URL: korrel-0.1.3-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 56.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for korrel-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b1794704ce483b0d024d24b5427aca5666554a643af7476a80702548f328f8aa`
MD5	`bba80546e76184e42ead24f9b9c7d57a`
BLAKE2b-256	`4de0c65d356c5015667e78d6d20574b54a0f2fa980ed087159714d6d4f992308`

See more details on using hashes here.

korrel 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

korrel

Install

Quickstart

Go live

Cost

The pytest CI gate

The four objects

The module convention

Running the Python API directly

Export to verifiers

Install

Python API

CLI export

Concept mapping

What survives the translation

Export to OpenEnv

Install

Python API

CLI export

Concept mapping

What survives the translation

Determinism

Telemetry

Data model and chat-completions compatibility

Types

Chat-completions mapping table

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes