Deterministic CI tests for LLM agent trajectories — record once, replay offline, assert contracts

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

miki.ships

These details have not been verified by PyPI

Project description

pytest-agentcontract

Deterministic CI tests for LLM agent trajectories. Record once, replay offline, assert contracts.

Your agent calls lookup_order, then check_eligibility, then process_refund. Every time. That's the contract. Test it like any other interface.

# Record a trajectory (hits real APIs once)
pytest --ac-record

# Replay in CI forever (no network, no API keys, no cost, deterministic)
pytest --ac-replay

tests/scenarios/refund-eligible.agentrun.json
├── turn 0: user → "I want a refund for order 123"
├── turn 1: assistant → lookup_order(order_id="123")
├── turn 2: assistant → check_eligibility(order_id="123")
├── turn 3: assistant → process_refund(order_id="123", amount=49.99)
└── turn 4: assistant → "Your refund of $49.99 has been processed."

Install

pip install pytest-agentcontract

With auto-recording interceptors:

pip install pytest-agentcontract[openai]      # OpenAI SDK
pip install pytest-agentcontract[anthropic]    # Anthropic SDK
pip install pytest-agentcontract[all]          # Everything

Framework adapters (LangGraph, LlamaIndex, OpenAI Agents SDK) are included -- no extras needed.

Quick Start

1. Write a test

@pytest.mark.agentcontract("refund-eligible")
def test_refund_flow(ac_recorder, ac_mode, ac_replay_engine, ac_check_contract):
    if ac_mode == "record":
        # Runs your real agent, records the trajectory
        run_my_agent(ac_recorder)
    elif ac_mode == "replay":
        # Replays from cassette -- no network, no tokens
        result = ac_replay_engine.run()

    contract = ac_check_contract(ac_recorder.run)
    assert contract.passed, contract.failures()

2. Record once

pytest --ac-record -k test_refund_flow
# Creates tests/scenarios/refund-eligible.agentrun.json

3. Replay in CI

pytest --ac-replay
# Deterministic. No API keys. No flakes. Sub-second.

SDK Auto-Recording

Intercept real SDK calls instead of manually building turns:

from agentcontract.recorder.interceptors import patch_openai

def test_with_real_agent(ac_recorder):
    client = openai.OpenAI()
    unpatch = patch_openai(client, ac_recorder)

    # Every chat.completions.create call is recorded automatically
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Refund order 123"}],
        tools=[...],
    )
    unpatch()

Works with Anthropic too:

from agentcontract.recorder.interceptors import patch_anthropic

unpatch = patch_anthropic(client, ac_recorder)

Framework Adapters

Drop-in recording for popular agent frameworks:

# LangGraph
from agentcontract.adapters import record_graph
unpatch = record_graph(compiled_graph, recorder)
result = compiled_graph.invoke({"messages": [("user", "I need a refund")]})
unpatch()

# LlamaIndex
from agentcontract.adapters import record_agent
unpatch = record_agent(agent, recorder)
response = agent.chat("What's the refund policy?")
unpatch()

# OpenAI Agents SDK
from agentcontract.adapters import record_runner
unpatch = record_runner(recorder)
result = Runner.run_sync(agent, "Help with billing")
unpatch()

Configuration

agentcontract.yml in your project root:

version: "1"

scenarios:
  include: ["tests/scenarios/**/*.agentrun.json"]

replay:
  stub_tools: true

defaults:
  assertions:
    - type: contains
      target: final_response
      value: "refund"
    - type: called_with
      target: "tool:process_refund"
      schema:
        order_id: "123"

policies:
  - name: allowed-tools
    type: tool_allowlist
    tools: [lookup_order, check_eligibility, process_refund]

  - name: confirm-before-refund
    type: requires_confirmation
    tools: [process_refund]

Generate a starter config:

agentcontract init

Assertions

Type	What It Checks
`exact`	Exact string match
`contains`	Substring present
`regex`	Pattern match
`json_schema`	JSON Schema validation on tool args/results
`not_called`	Tool was NOT invoked
`called_with`	Tool called with specific arguments
`called_count`	Exact invocation count

Policies

Policy	What It Enforces
`tool_allowlist`	Only listed tools may be called
`requires_confirmation`	Protected tools must follow user confirmation

Target Syntax

final_response -- last assistant message
turn:N -- specific turn by index
full_conversation -- all turns concatenated
tool_call:function_name:arguments -- tool call arguments
tool_call:function_name:result -- tool call result

CLI

agentcontract info cassette.agentrun.json       # Cassette summary
agentcontract validate cassette.agentrun.json   # Structure check
agentcontract init                               # Starter config

Why Not VCR / pytest-recording?

VCR records HTTP requests. This records agent decisions.

VCR: "did the HTTP request match?" -- brittle, breaks on any provider API change
agentcontract: "did the agent call the right tools with the right args?" -- tests actual behavior

Your agent's contract is: given this input, it calls these tools in this order with these arguments. That's what you want to regression-test, not the HTTP layer underneath.

How It Works

┌─────────┐     ┌──────────┐     ┌───────────────┐
│  pytest  │────▶│ Recorder │────▶│ .agentrun.json│
│ --record │     │          │     │  (cassette)   │
└─────────┘     └──────────┘     └───────┬───────┘
                                         │
┌─────────┐     ┌──────────┐             │
│  pytest  │────▶│  Replay  │◀────────────┘
│ --replay │     │  Engine  │
└─────────┘     └────┬─────┘
                     │
                ┌────▼─────┐
                │Assertion │──▶ pass / fail
                │ Engine   │
                └──────────┘

Record: Run your agent against real APIs. The recorder captures every turn, tool call, argument, and result into a .agentrun.json cassette.
Replay: The replay engine feeds recorded tool results back. No network. No tokens. Deterministic.
Assert: The assertion engine checks contracts -- tool sequences, argument schemas, response content, policies.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

miki.ships

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Feb 18, 2026

This version

0.1.0

Feb 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_agentcontract-0.1.0.tar.gz (278.6 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_agentcontract-0.1.0-py3-none-any.whl (32.2 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file pytest_agentcontract-0.1.0.tar.gz.

File metadata

Download URL: pytest_agentcontract-0.1.0.tar.gz
Upload date: Feb 18, 2026
Size: 278.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pytest_agentcontract-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`346ca7626b6f98ab300b2d87b2b2a53ffb60a92aa9d5adb78910429dd2427a89`
MD5	`d73679c7b967b938b4faf5ea0c32c2ae`
BLAKE2b-256	`7d3d871069cd5853eeabf2f557925b20031dc0aaca500a789d0832da563118ec`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_agentcontract-0.1.0.tar.gz:

Publisher: publish.yml on mikiships/pytest-agentcontract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_agentcontract-0.1.0.tar.gz
- Subject digest: 346ca7626b6f98ab300b2d87b2b2a53ffb60a92aa9d5adb78910429dd2427a89
- Sigstore transparency entry: 962732164
- Sigstore integration time: Feb 18, 2026
Source repository:
- Permalink: mikiships/pytest-agentcontract@cc36da0f9aed83176c8dd713c685af0435a0057b
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/mikiships
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cc36da0f9aed83176c8dd713c685af0435a0057b
- Trigger Event: release

File details

Details for the file pytest_agentcontract-0.1.0-py3-none-any.whl.

File metadata

Download URL: pytest_agentcontract-0.1.0-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 32.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pytest_agentcontract-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0ab9a95d76a59f6f24af992ec25e9e700ec52dc5bd63ae878d13db92864d399`
MD5	`b746848f27676910ef02b9d7932a02ad`
BLAKE2b-256	`9a9b655c6c0fef77c8b6773cfe8f399af9ae3aab5f667a40ee7d81482265cb98`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_agentcontract-0.1.0-py3-none-any.whl:

Publisher: publish.yml on mikiships/pytest-agentcontract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_agentcontract-0.1.0-py3-none-any.whl
- Subject digest: b0ab9a95d76a59f6f24af992ec25e9e700ec52dc5bd63ae878d13db92864d399
- Sigstore transparency entry: 962732172
- Sigstore integration time: Feb 18, 2026
Source repository:
- Permalink: mikiships/pytest-agentcontract@cc36da0f9aed83176c8dd713c685af0435a0057b
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/mikiships
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cc36da0f9aed83176c8dd713c685af0435a0057b
- Trigger Event: release

pytest-agentcontract 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pytest-agentcontract

Install

Quick Start

1. Write a test

2. Record once

3. Replay in CI

SDK Auto-Recording

Framework Adapters

Configuration

Assertions

Policies

Target Syntax

CLI

Why Not VCR / pytest-recording?

How It Works

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance