Deterministic CI tests for LLM agent trajectories — record once, replay offline, assert contracts
Project description
pytest-agentcontract
Deterministic CI tests for LLM agent trajectories. Record once, replay offline, assert contracts.
Your agent calls lookup_order, then check_eligibility, then process_refund. Every time. That's the contract. Test it like any other interface.
# Record a trajectory (hits real APIs once)
pytest --ac-record
# Replay in CI forever (no network, no API keys, no cost, deterministic)
pytest --ac-replay
tests/scenarios/refund-eligible.agentrun.json
├── turn 0: user → "I want a refund for order 123"
├── turn 1: assistant → lookup_order(order_id="123")
├── turn 2: assistant → check_eligibility(order_id="123")
├── turn 3: assistant → process_refund(order_id="123", amount=49.99)
└── turn 4: assistant → "Your refund of $49.99 has been processed."
Install
pip install pytest-agentcontract
With auto-recording interceptors:
pip install pytest-agentcontract[openai] # OpenAI SDK
pip install pytest-agentcontract[anthropic] # Anthropic SDK
pip install pytest-agentcontract[all] # Everything
Framework adapters (LangGraph, LlamaIndex, OpenAI Agents SDK) are included -- no extras needed.
Quick Start
1. Write a test
@pytest.mark.agentcontract("refund-eligible")
def test_refund_flow(ac_recorder, ac_mode, ac_replay_engine, ac_check_contract):
if ac_mode == "record":
# Runs your real agent, records the trajectory
run_my_agent(ac_recorder)
elif ac_mode == "replay":
# Replays from cassette -- no network, no tokens
result = ac_replay_engine.run()
contract = ac_check_contract(ac_recorder.run)
assert contract.passed, contract.failures()
2. Record once
pytest --ac-record -k test_refund_flow
# Creates tests/scenarios/refund-eligible.agentrun.json
3. Replay in CI
pytest --ac-replay
# Deterministic. No API keys. No flakes. Sub-second.
SDK Auto-Recording
Intercept real SDK calls instead of manually building turns:
from agentcontract.recorder.interceptors import patch_openai
def test_with_real_agent(ac_recorder):
client = openai.OpenAI()
unpatch = patch_openai(client, ac_recorder)
# Every chat.completions.create call is recorded automatically
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Refund order 123"}],
tools=[...],
)
unpatch()
Works with Anthropic too:
from agentcontract.recorder.interceptors import patch_anthropic
unpatch = patch_anthropic(client, ac_recorder)
Framework Adapters
Drop-in recording for popular agent frameworks:
# LangGraph
from agentcontract.adapters import record_graph
unpatch = record_graph(compiled_graph, recorder)
result = compiled_graph.invoke({"messages": [("user", "I need a refund")]})
unpatch()
# LlamaIndex
from agentcontract.adapters import record_agent
unpatch = record_agent(agent, recorder)
response = agent.chat("What's the refund policy?")
unpatch()
# OpenAI Agents SDK
from agentcontract.adapters import record_runner
unpatch = record_runner(recorder)
result = Runner.run_sync(agent, "Help with billing")
unpatch()
Configuration
agentcontract.yml in your project root:
version: "1"
scenarios:
include: ["tests/scenarios/**/*.agentrun.json"]
replay:
stub_tools: true
defaults:
assertions:
- type: contains
target: final_response
value: "refund"
- type: called_with
target: "tool:process_refund"
schema:
order_id: "123"
policies:
- name: allowed-tools
type: tool_allowlist
tools: [lookup_order, check_eligibility, process_refund]
- name: confirm-before-refund
type: requires_confirmation
tools: [process_refund]
Generate a starter config:
agentcontract init
Assertions
| Type | What It Checks |
|---|---|
exact |
Exact string match |
contains |
Substring present |
regex |
Pattern match |
json_schema |
JSON Schema validation on tool args/results |
not_called |
Tool was NOT invoked |
called_with |
Tool called with specific arguments |
called_count |
Exact invocation count |
Policies
| Policy | What It Enforces |
|---|---|
tool_allowlist |
Only listed tools may be called |
requires_confirmation |
Protected tools must follow user confirmation |
Target Syntax
final_response-- last assistant messageturn:N-- specific turn by indexfull_conversation-- all turns concatenatedtool_call:function_name:arguments-- tool call argumentstool_call:function_name:result-- tool call result
CLI
agentcontract info cassette.agentrun.json # Cassette summary
agentcontract validate cassette.agentrun.json # Structure check
agentcontract init # Starter config
Why Not VCR / pytest-recording?
VCR records HTTP requests. This records agent decisions.
- VCR: "did the HTTP request match?" -- brittle, breaks on any provider API change
- agentcontract: "did the agent call the right tools with the right args?" -- tests actual behavior
Your agent's contract is: given this input, it calls these tools in this order with these arguments. That's what you want to regression-test, not the HTTP layer underneath.
How It Works
┌─────────┐ ┌──────────┐ ┌───────────────┐
│ pytest │────▶│ Recorder │────▶│ .agentrun.json│
│ --record │ │ │ │ (cassette) │
└─────────┘ └──────────┘ └───────┬───────┘
│
┌─────────┐ ┌──────────┐ │
│ pytest │────▶│ Replay │◀────────────┘
│ --replay │ │ Engine │
└─────────┘ └────┬─────┘
│
┌────▼─────┐
│Assertion │──▶ pass / fail
│ Engine │
└──────────┘
- Record: Run your agent against real APIs. The recorder captures every turn, tool call, argument, and result into a
.agentrun.jsoncassette. - Replay: The replay engine feeds recorded tool results back. No network. No tokens. Deterministic.
- Assert: The assertion engine checks contracts -- tool sequences, argument schemas, response content, policies.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_agentcontract-0.1.1.tar.gz.
File metadata
- Download URL: pytest_agentcontract-0.1.1.tar.gz
- Upload date:
- Size: 279.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68ce86484b4c5a8b5ff49f417737cef5c58e6cfd3c7cb705f703826101c442eb
|
|
| MD5 |
27dfa31a32064b4bd4831ae329d00210
|
|
| BLAKE2b-256 |
b111362d9ccd76704c78e4a69e6b4354dbf629d4344d8376d7937778a2ab71f1
|
Provenance
The following attestation bundles were made for pytest_agentcontract-0.1.1.tar.gz:
Publisher:
publish.yml on mikiships/pytest-agentcontract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_agentcontract-0.1.1.tar.gz -
Subject digest:
68ce86484b4c5a8b5ff49f417737cef5c58e6cfd3c7cb705f703826101c442eb - Sigstore transparency entry: 963055532
- Sigstore integration time:
-
Permalink:
mikiships/pytest-agentcontract@84c58f2b8b0fa38eeba3b383e4e0cd26dd4e2b81 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mikiships
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84c58f2b8b0fa38eeba3b383e4e0cd26dd4e2b81 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pytest_agentcontract-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pytest_agentcontract-0.1.1-py3-none-any.whl
- Upload date:
- Size: 33.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
301c47fe93e3848d8feb8d9c1fde03c38dc227c88fdd597e4b9dbbb1c655e5ed
|
|
| MD5 |
1d3b3118424ea9c9d1879d36c569e534
|
|
| BLAKE2b-256 |
58a8b07a0f6fbe29d6a5ba29f030e1571b9fe739f26c9b655edbcfdbc803dcb4
|
Provenance
The following attestation bundles were made for pytest_agentcontract-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on mikiships/pytest-agentcontract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_agentcontract-0.1.1-py3-none-any.whl -
Subject digest:
301c47fe93e3848d8feb8d9c1fde03c38dc227c88fdd597e4b9dbbb1c655e5ed - Sigstore transparency entry: 963055543
- Sigstore integration time:
-
Permalink:
mikiships/pytest-agentcontract@84c58f2b8b0fa38eeba3b383e4e0cd26dd4e2b81 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mikiships
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84c58f2b8b0fa38eeba3b383e4e0cd26dd4e2b81 -
Trigger Event:
release
-
Statement type: