OSS Python SDK for agent simulation: define a multi-turn agent test once, run it as a pytest CI gate, export it as a verifiers/OpenEnv RL environment.
Project description
korrel
OSS Python SDK for agent simulation. Define a multi-turn agent test once, run it as a pytest CI gate, export it as a verifiers/OpenEnv RL environment.
Install
pip install korrel
Inside a uv project, add it as a dependency instead:
uv add korrel
Bring your own provider keys. Korrel reads keys from the environment at call time and stores none. The default provider is Claude via the anthropic SDK; set ANTHROPIC_API_KEY. OpenAI support is an optional extra (korrel[openai]).
Quickstart
The following is a complete path from installation to a passing scenario run.
1. Write a scenario module (support_refund.py):
from korrel import MockTool, Persona, Rubric, Scenario, adapter_from_provider
from korrel.providers import AnthropicProvider
from korrel.types import Message
def lookup_order(arguments, state):
state["called"] = True
orders = {"A1001": {"order_id": "A1001", "amount": 49.99, "refundable": True}}
return orders.get(arguments.get("order_id", ""), {"error": "not found"})
orders = MockTool(
name="lookup_order",
schema={"type": "function", "function": {
"name": "lookup_order",
"description": "Look up an order by its id.",
"parameters": {"type": "object",
"properties": {"order_id": {"type": "string"}},
"required": ["order_id"]},
}},
respond=lookup_order,
)
def confirmed(completion, info, **kwargs):
amount = f"{info['amount']:.2f}"
return 1.0 if any(
m.role == "assistant" and m.content
and "refund" in m.content.lower() and amount in m.content
for m in completion
) else 0.0
scenario = Scenario(
id="support_refund",
system="You are a support agent. Resolve refunds using lookup_order.",
persona=Persona(goal="Get a refund for order A1001.", behavior="Polite but firm."),
opening_message="My order A1001 arrived broken and I want a refund.",
tools=[orders],
max_turns=3,
seed=7,
info={"order_id": "A1001", "amount": 49.99},
rubric=Rubric(funcs=[confirmed], pass_threshold=0.5),
)
# AnthropicProvider reads ANTHROPIC_API_KEY at call time, not at import time.
adapter = adapter_from_provider(AnthropicProvider())
2. Run it:
korrel run support_refund.py
Output on pass:
scenario : support_refund
score : 1.0000
status : pass
model calls : 6
transcript : .korrel/support_refund.transcript.json
Output on failure (exit code 1):
scenario : support_refund
score : 0.0000
status : fail
model calls : 6
failed : confirmed
clusters : confirmed(zero)
transcript : .korrel/support_refund.transcript.json
The CLI exits zero on pass and non-zero on failure. The full conversation transcript is written to .korrel/<scenario-id>.transcript.json.
CLI flags:
korrel run SCENARIO_PY [--out DIR] [--seed N]
[--scenario-attr NAME] [--adapter-attr NAME]
--out overrides the transcript directory (default .korrel/). --seed overrides the scenario seed. --scenario-attr and --adapter-attr override the module attribute names (defaults: scenario, adapter).
Cost
Korrel is bring-your-own-keys. Every model call a run makes is billed to your own provider account. Korrel stores no key and bills nothing; the key is read from the environment at call time and stored nowhere. The cost of a run is the number of model calls it makes, so the figures below are stated in model calls, not dollars (the price per call depends on your provider and tier).
A run of one scenario makes these calls:
- One call to the agent under test at the start of each turn.
- One additional agent call for each tool-use round in a turn (running a mock tool is local Python, not a model call).
- One call to the user-simulator (the
Persona) per turn that continues. The opening message is a fixed string and makes no call, and the final turn makes no user-simulator call. - One call for an LLM judge, if the rubric has one, made once per run at scoring time. A plain reward function is local Python and makes no call.
As a rule of thumb, a scenario of T turns with no tool rounds and no judge is about 2T - 1 calls: T agent calls plus T - 1 user-simulator calls. Each tool round adds one agent call; a judge adds one call.
korrel run prints the measured number of calls the simulation loop made for the run, on the model calls line of the summary. That count covers the agent and user-simulator calls the loop issues. It does not yet include the judge's scoring-time call, which happens inside the rubric after the loop ends.
The pytest CI gate
Name scenario files *_scenario.py and place them in your test tree. The korrel pytest plugin (registered automatically via the pytest11 entry-point, no conftest needed) discovers and runs them:
uv run pytest
A passing scenario produces a green dot. A failing scenario produces a normal pytest failure block showing score, threshold, failed rubric function names, and the transcript path.
Discovery options:
--korrel-glob GLOB: override which file name pattern the plugin picks up (command-line flag)korrel_glob = *_scenario.py: the correspondingpytest.ini/pyproject.tomloption
Each scenario file in the CI gate must expose both a module-level scenario and a module-level adapter. If adapter is absent, the item is reported as an error immediately.
The four objects
Scenario: a code-first test definition. Holds the system prompt, aPersona, the opening message, mock tools,max_turns,max_tool_rounds, aseed, ground-truthinfo, and aRubric.Persona: the LLM-driven user-simulator. Given the conversation so far, it produces the next user message. Defaults to Claude.MockTool: a programmable tool. Holds a chat-completions tool schema and arespondcallable that takes parsed arguments and a mutable per-run state and returns a result.Rubric: reward functions plus an optional hardened LLM judge. Reward signatures mirror verifiers:(completion, info, **kwargs) -> float. The judge treats the transcript as data, never as instructions.
The module convention
A scenario module exposes two module-level names:
scenario: aScenarioinstance.adapter: any callable(messages: list[Message], tools: list[ToolSchema]) -> Message. Bothkorrel runand the pytest plugin read these names (overridable via--scenario-attr/--adapter-attr).
The built-in helper adapter_from_provider(provider) wraps any Provider (such as AnthropicProvider) as an adapter. Because AnthropicProvider reads the API key from the environment only at call time, constructing adapter_from_provider(AnthropicProvider()) at module level is import-safe: no key is required to import the module.
Running the Python API directly
from korrel import run_scenario
result = run_scenario(scenario, adapter)
print(result.score, result.passed, result.failed_functions)
examples/support_refund.py holds a runnable scenario definition with a real AnthropicProvider adapter.
Export to verifiers
A Korrel scenario can be translated into a verifiers RL training environment. The full mapping is specified in docs/spec/korrel-to-verifiers.md.
Install
verifiers is an optional extra. import korrel works without it.
pip install 'korrel[verifiers]'
# or
uv add 'korrel[verifiers]'
verifiers==0.1.14 requires Python <3.14, so the extra installs only on Python 3.10-3.13.
Python API
from korrel.exporters.verifiers import to_verifiers_env
env = to_verifiers_env(scenario)
to_verifiers_env returns a constructed verifiers environment: a MultiTurnEnv subclass for most scenarios (persona-driven or tool-bearing), or a SingleTurnEnv for a single-exchange scenario with no tools and no persona follow-up. Using the support-refund scenario from the quickstart, the call returns a MultiTurnEnv because the scenario has both a persona and a mock tool.
CLI export
korrel export support_refund.py --to verifiers --out ./support_refund_env
This writes a pip-installable package that verifiers discovers via load_environment:
support_refund_env/
pyproject.toml
support_refund.py # environment module exposing load_environment()
_scenario.py # copy of the original scenario source
Install the package in a Python 3.10-3.13 environment and load it:
import verifiers as vf
env = vf.load_environment("support_refund")
When --out is omitted, the package is written to .korrel/export/<scenario-id>/.
Concept mapping
| Korrel concept | verifiers target |
|---|---|
Scenario (persona or tools or max_turns > 1) |
MultiTurnEnv subclass |
Scenario (single exchange, no tools) |
SingleTurnEnv |
Persona |
user-turn generation inside env_response |
MockTool |
tool-execution branch of env_response |
| Rubric reward functions | verifiers.Rubric(funcs=..., weights=...) |
Scenario.system + opening_message + info |
dataset row (prompt, info) |
The spec holds the full field-level detail.
What survives the translation
The reward-function signature (completion, info, **kwargs) -> float is unchanged. The canonical transcript types in korrel.types are unchanged. The only adaptation at the boundary is converting the completion value from the verifiers message shape to the korrel canonical shape before calling each reward function.
Lossy edges (summarized; see the spec's Lossy edges section for the complete list):
- The persona generates user turns with a live model call inside
env_response. BYO key, non-deterministic. Offline tests substitute a fake persona. - The judge reward function likewise makes a live model call during scoring.
- Korrel aggregates reward functions by mean; verifiers uses a weighted sum. The exporter sets weights to
1/nto reproduce the mean. - Korrel
max_turnscounts user-to-assistant exchanges; verifiersmax_turnscounts individual model-response steps. The exporter derives the verifiers step budget fromscenario.max_turnsandscenario.max_tool_rounds.
Export to OpenEnv
A Korrel scenario can be translated into an OpenEnv environment server. The full mapping is specified in docs/spec/korrel-to-openenv.md.
Install
openenv-core is an optional extra. import korrel works without it.
pip install 'korrel[openenv]'
# or
uv add 'korrel[openenv]'
openenv-core>=0.3.0 declares Requires-Python: >=3.10 with no upper bound, so it installs on Python 3.10 through 3.14, including the repo's default Python 3.14. This contrasts with the verifiers extra, which is capped at Python <3.14.
Python API
from korrel.exporters.openenv import build_environment_class
EnvironmentCls = build_environment_class(scenario, observation_cls, action_cls)
build_environment_class returns a concrete openenv.core.env_server.interfaces.Environment subclass. The caller supplies the author-defined Action and Observation subclasses (generated by the CLI export; see below). The persona keyword argument accepts an override for offline testing without live model calls.
The primary deployment path is the CLI export: the generated package is the artifact that gets deployed to a Hugging Face Space.
CLI export
korrel export support_refund.py --to openenv --out ./support_refund_env
This writes a pip-installable OpenEnv environment package:
support_refund_env/
__init__.py
client.py
models.py
openenv.yaml
pyproject.toml
README.md
_scenario.py # copy of the original scenario source
server/
__init__.py
support_refund_environment.py
app.py
Dockerfile
requirements.txt
When --out is omitted, the package is written to .korrel/export/<scenario-id>/.
Deploy the generated package to a Hugging Face Space:
openenv push --secret ANTHROPIC_API_KEY=<your-key>
The persona and judge make live model calls inside the container. The API key is supplied at runtime via openenv push --secret and is never written into any emitted file.
Concept mapping
| Korrel concept | OpenEnv target |
|---|---|
Scenario |
Environment subclass |
scenario.system + opening_message |
reset() seed observation |
| agent turn | author-defined Action subclass |
| environment reply | author-defined Observation subclass |
MockTool execution |
tool branch of step() |
Persona.next_message |
persona branch of step() |
Rubric aggregate |
terminal observation.reward (done=True) |
The spec holds the full field-level detail.
What survives the translation
The reward-function signature (completion, info, **kwargs) -> float is unchanged. The canonical transcript types in korrel.types are unchanged. The OpenEnv Rubric class (incompatible action/observation signature) is not used; korrel computes reward with its own Rubric.score.
Lossy edges (summarized; see the spec's Lossy edges section for the complete list):
- Reward is terminal, not dense. Every intermediate step carries
reward=None; the final step (done=True) carries the rubric aggregate. - The persona and judge run server-side inside the container; they make live model calls, which are non-deterministic and require a key at runtime.
- Content-shape narrowing:
Observation.messagescarries dicts; the full canonicalMessagetype is used internally and serialized at the boundary. - Tool-call arguments stay a JSON string (
ToolFunction.arguments), matching the chat-completions wire format.
Determinism
Every run takes a seed and records the model and request parameters. The seed pins scenario setup and any sampling Korrel controls. LLM calls are not bit-reproducible; provider nondeterminism is outside the seed.
Telemetry
Korrel includes opt-in telemetry. On opt-in, korrel run sends a single content-scrubbed run event to Korrel's collector. No scenario content, persona text, transcripts, prompts, tool schemas, file paths, model names, or keys are ever collected. The event carries only aggregate counters and version metadata.
What the run event sends (every field, nothing more):
| Field | Description |
|---|---|
event |
Always "run" |
schema_version |
Event schema version (currently "1") |
korrel_version |
Installed korrel version string |
python_version |
CPython version string |
scenario_count |
Number of scenarios in the run |
total_turns |
Total turns across all scenarios |
pass_count |
Number of passing scenarios |
fail_count |
Number of failing scenarios |
duration_s |
Wall-clock duration in seconds |
install_id |
Anonymous, randomly generated UUID (created once, stored locally) |
No key, scenario id, path, persona, transcript, prompt, tool schema, or model name is present in the event.
Opt-outs (any one disables telemetry):
- Set
KORREL_TELEMETRY=0(also acceptsfalse,no,off) in the environment. - Set
DO_NOT_TRACK=1in the environment. - Telemetry is automatically off in CI (detected via
CI,GITHUB_ACTIONS,TRAVIS,CIRCLECI,GITLAB_CI,JENKINS_URL,BUILDKITE,TF_BUILD,TEAMCITY_VERSION,BITBUCKET_BUILD_NUMBER). - On the first interactive run outside CI, Korrel prompts once for consent and persists the answer. Declining disables telemetry permanently for that install. Non-interactive sessions default to off with no prompt.
Where the event goes. Opted-in events are sent to Korrel's public collector. Set KORREL_TELEMETRY_ENDPOINT to redirect them to a self-hosted collector instead. Set KORREL_TELEMETRY_DEBUG=1 to write the event JSON to stderr for inspection instead of sending it. Sending is best-effort with a 3 second timeout; a failure never affects the run.
Consent and the anonymous install id are stored in %APPDATA%\korrel\config.json (Windows) or $XDG_CONFIG_HOME/korrel/config.json / ~/.config/korrel/config.json (Linux/macOS). No key, scenario content, or identifying information is ever written there.
Data model and chat-completions compatibility
The canonical transcript types in korrel.types are provider-neutral and wire-compatible with the OpenAI chat-completions message schema. They are the v0.2 verifiers/OpenEnv export target. Their attribute names and wire shapes are a contract.
Types
Message - one message in a conversation:
Message(
role="assistant", # "system" | "user" | "assistant" | "tool"
content="Your refund...", # text body; None when only tool_calls is set
tool_calls=[...], # present on assistant messages that call tools
tool_call_id="call_1", # links a role="tool" message to the call it answers
name=None, # optional speaker name
)
ToolCall - a single tool invocation inside an assistant message:
ToolCall(
id="call_1",
type="function", # always "function"
function=ToolFunction(
name="lookup_order",
arguments='{"order_id": "A1001"}', # JSON-encoded string, not a dict
),
)
arguments is a JSON-encoded string, matching the chat-completions wire format (OpenAI API reference, tool_calls[].function.arguments). Parse with json.loads() to recover the call arguments.
ToolSchema - a tool definition passed to an adapter, in chat-completions tool-schema shape:
{"type": "function", "function": {
"name": "lookup_order",
"description": "Look up an order by its id.",
"parameters": {"type": "object",
"properties": {"order_id": {"type": "string"}},
"required": ["order_id"]},
}}
Chat-completions mapping table
| Canonical field | Chat-completions wire field | Notes |
|---|---|---|
Message.role |
role |
"system", "user", "assistant", "tool" |
Message.content |
content |
None when only tool calls are present |
Message.tool_calls |
tool_calls |
array of ToolCall objects |
Message.tool_call_id |
tool_call_id |
on role="tool" messages |
ToolCall.id |
tool_calls[].id |
|
ToolCall.type |
tool_calls[].type |
always "function" |
ToolCall.function.name |
tool_calls[].function.name |
|
ToolCall.function.arguments |
tool_calls[].function.arguments |
JSON string, not a dict |
The Anthropic provider (AnthropicProvider in korrel.providers) translates between tool_use/tool_result blocks and these canonical types. Nothing in korrel.types depends on the openai package.
License
MIT.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file korrel-0.1.0.tar.gz.
File metadata
- Download URL: korrel-0.1.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17982e2fbc979ae34f93a9e51076c0801c6c384f21181e45b50dc78b35c76df2
|
|
| MD5 |
b5a020dcd6966b2208900cf5ee88f75e
|
|
| BLAKE2b-256 |
2312cf8e638bc02183b365c4e8060dd12809e833367c73a3d98686ad239f47bb
|
File details
Details for the file korrel-0.1.0-py3-none-any.whl.
File metadata
- Download URL: korrel-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29e75f5668cfc4849012ae4ea1f721feada69a0af6cab3ddf48274fb456e2107
|
|
| MD5 |
829435b993a3c4b84cc85b85da49581f
|
|
| BLAKE2b-256 |
38cfaac6f6084b5f066c561c9c55498aebb2c3c694ca42ee0f16be6021394624
|