The pytest for AI agents. Record, replay, assert, and diff agent behavior.

These details have not been verified by PyPI

Project links

Project description

Mimic

The pytest for AI agents. Record, replay, assert, and diff agent behavior.

Mimic is an open-source library that lets you record an AI agent's behavior, replay it deterministically, assert properties about it, and diff runs across versions. It's the missing testing layer for the agent era.

from mimic import Mimic, assert_that, replay
from mimic.integrations.openai import tracked_completion

mimic = Mimic()
client = OpenAI()

@mimic.record("customer-support-agent", model="gpt-4o")
def answer(question: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}],
    )
    tracked_completion(resp)  # auto-captures tokens + cost
    return resp.choices[0].message.content

Verified performance

Scenario	Record mode	Replay mode	Savings
5-test multi-step agent suite	360 ms	50 ms	7× faster
1000 CI runs of the same suite	~$2 in LLM cost	$0	100%

Run mimic benchmark --runs 1000 on your own recordings to see your numbers.

Why Mimic?

Every team building AI agents hits the same wall:

"I changed a prompt. Did I break anything?" — You don't know.
"I switched from GPT-4 to Claude. Is it 2x more expensive?" — You don't know.
"Did this agent ever call delete_file in production?" — You don't know.
"Why did the agent fail on Tuesday at 3pm?" — You don't know.

Mimic turns those unknowns into testable, replayable, diffable artifacts. Think Sentry recordings + pytest assertions + git blame, purpose-built for LLM agents.

Features

✅ Record any callable — sync or async, LLM calls, tool use, multi-step agents
✅ Replay runs offline with zero API cost, byte-for-byte deterministic
✅ Assert behavioral properties: cost, latency, tool usage, output content
✅ Diff two runs to see exactly what changed
✅ Auto-track LLM costs for OpenAI, Anthropic, Gemini (zero-config)
✅ Multi-step agents with per-step recording, cost, and metadata
✅ Privacy mode (capture_args=False, capture_return=False)
✅ Storage-agnostic — filesystem by default, pluggable for S3/Postgres
✅ Zero LLM vendor lock-in — works with any model
✅ Beautiful CLI — mimic run / list / show / diff / report / benchmark
✅ CI-ready — GitHub Actions template + pre-commit hook included

Install

pip install mimic-ai

Or with optional integrations:

pip install mimic-ai[openai]
pip install mimic-ai[anthropic]

Quick start

mkdir my-agent && cd my-agent
mimic init

This creates a project skeleton:

my-agent/
├── mimic.yaml           # Project config
├── tests/
│   └── test_agent.py    # Your recorded tests
└── .mimic/              # Recorded runs (gitignored by default)

Edit tests/test_agent.py:

from mimic import Mimic, assert_that, replay

mimic = Mimic()

@mimic.record("my-agent", model="gpt-4o")
def answer(question: str) -> str:
    # ... your LLM call here ...
    return "..."

def test_agent():
    answer("hello")
    recorded = replay("my-agent")
    assert_that(recorded).finished_without_errors()
    assert_that(recorded).cost_less_than(usd=0.05)
    assert_that(recorded).did_not_call_tool("delete_database")

Run it:

mimic run tests/                 # records + runs (costs $$)
MIMIC_MODE=replay mimic run tests/  # replays only (free, deterministic)

Multi-step agents

For ReAct, multi-agent, or any agent with multiple LLM/tool calls, record each step:

@mimic.record("research-agent")
async def research(question: str) -> str:
    # Step 1: plan
    with mimic.step("plan", model="gpt-4o-mini") as s:
        resp = await llm.complete(model="gpt-4o-mini", messages=[...])
        tracked_completion(resp)
        s.metadata["plan_steps"] = 3

    # Step 2: search
    with mimic.step("search") as s:
        results = await web_search(question)
        s.metadata["result_count"] = len(results)

    # Step 3: synthesize
    with mimic.step("synthesize", model="gpt-4o") as s:
        resp = await llm.complete(model="gpt-4o", messages=[...])
        tracked_completion(resp)

    return summary

Assertions

The full chain (all return self for fluent chaining):

assert_that(run).finished_without_errors()
assert_that(run).had_error()                   # inverse
assert_that(run).cost_less_than(usd=0.05)
assert_that(run).completed_under(ms=2000)
assert_that(run).output_contains("substring")
assert_that(run).output_matches(r"regex")
assert_that(run).output_equals(value)
assert_that(run).called_tool("search")
assert_that(run).did_not_call_tool("delete_database")
assert_that(run).called_tools(["search", "synthesize"])
assert_that(run).had_exactly(3)
assert_that(run).had_at_least(2)
assert_that(run).used_model("gpt-4o")

How it works

Mimic sits outside your agent code, watching the inputs and outputs of any function you decorate. The first time the function runs, Mimic records the full execution into a content-addressable store. Subsequent test runs use the stored record instead of calling the LLM, making them fast, free, and deterministic.

For multi-step agents, Mimic records each step separately, so you can replay just the broken step without re-running the whole agent.

CI integration

Drop the included .github/workflows/test.yml into your repo. It runs your test suite in replay mode (no LLM cost) and validates that no cost was incurred.

Manual re-recording is a separate job, triggered on workflow_dispatch or a schedule.

The recording format

Mimic recordings are plain JSON conforming to a documented schema — see RECORDING_FORMAT.md. The format is vendor-neutral: you can build readers, web UIs, or analysis tools without depending on the Mimic library.

The $100M thesis

Mimic sits at the intersection of three exploding markets:

AI agent development — 10M+ developers will build agents by 2027.
AI observability — already a $2B+ market, dominated by closed vendors (LangSmith, Helicone, Langfuse).
AI safety & compliance — every enterprise deploying agents needs guardrails, audit trails, and replay.

The land-and-expand model is proven (Sentry, Supabase, GitLab, Vercel, PostHog): open source core → community growth → enterprise tier with self-hosted, SSO, audit logs, and SOC2.

See BUSINESS_PLAN.md for the full strategy.

Roadmap

v0.1 — Record/replay/assert core
v0.2 — Async + multi-step + OpenAI/Anthropic cost tracking
v0.3 — Web UI for browsing recorded runs
v0.4 — TypeScript SDK
v0.5 — Auto-generated regression tests from production traces
v0.6 — Multi-agent parent/child traces
v1.0 — Enterprise self-hosted edition

Contributing

We love contributions. See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mimic_recording-1.0.0.tar.gz (32.7 kB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mimic_recording-1.0.0-py3-none-any.whl (26.8 kB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file mimic_recording-1.0.0.tar.gz.

File metadata

Download URL: mimic_recording-1.0.0.tar.gz
Upload date: Jun 13, 2026
Size: 32.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for mimic_recording-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`bddb9d84f080114a901b71a30fb096183a751ce537069c91d29ad4377827735d`
MD5	`03f435e5901f635404a233fca162dfb4`
BLAKE2b-256	`e34d9afd04b44633f5312a6b829855717d0777fe2d51c6abe51fab5f7bf39b4d`

See more details on using hashes here.

File details

Details for the file mimic_recording-1.0.0-py3-none-any.whl.

File metadata

Download URL: mimic_recording-1.0.0-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 26.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for mimic_recording-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89010f1eea99c0491842ee30eb2fb28b9eeabe9d41434a280dd503b300e1c5a4`
MD5	`3b83f65cc7c078d25ec9dbaacae29049`
BLAKE2b-256	`b8081ddad8c4774e84ddd1b00846200d758a7f7a075d51ed154f712773449264`

See more details on using hashes here.

mimic-recording 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mimic

Verified performance

Why Mimic?

Features

Install

Quick start

Multi-step agents

Assertions

How it works

CI integration

The recording format

The $100M thesis

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes