Replay LLM API calls in tests. Zero cost. Zero flakes. Like vcr.py but for LLM SDKs.

These details have not been verified by PyPI

Project links

Project description

cuesheet

Record once. Replay forever. Test LLM-calling code without burning the API.

The problem

If you've ever tried to write tests for code that calls an LLM, you know the drill.

Hitting Anthropic or OpenAI in CI is slow (multi-second per call), flaky (rate limits, transient errors, sampling drift), and expensive (every PR run bleeds tokens).
Hand-rolled mocks rot the moment the SDK ships a breaking change, and they never quite match what the real API returns.
Existing HTTP fixtures (vcr.py, respx, pytest-vcr) work, but they don't understand LLM payloads, don't replay streamed responses faithfully, and don't scrub API keys for you.

So most teams settle for one of three bad options: skip the tests, mark them slow and skip them in CI, or write a brittle mock and pray.

What cuesheet does

cuesheet is a test-fixture library for any Python LLM SDK that uses httpx under the hood (Anthropic, OpenAI, Mistral, Gemini, Cohere, Groq, DeepSeek, Together, LiteLLM, and anything else built on the standard transport).

You wrap your test in @cuesheet.cassette(...). The first time it runs, cuesheet hits the real API and saves the request/response pair to a YAML file you commit to your repo. Every run after that replays from the file. Same response, byte-for-byte. No network calls. No flakes. No cost.

import cuesheet

@cuesheet.cassette("tests/cassettes/test_summarizer.yaml")
def test_summarizer():
    from anthropic import Anthropic
    client = Anthropic()

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=200,
        messages=[{"role": "user", "content": "Summarize: ..."}],
    )

    assert "key point" in response.content[0].text

That's the whole API. One decorator. One YAML file per test. Drop it in.

Features

Sync and async clients, both supported.
Streaming responses recorded as raw SSE chunks and replayed in order at configurable speed.
One cassette can hold multiple interactions across multiple providers.
YAML format chosen for git-friendly diffs during code review.
Auto-scrubs Anthropic and OpenAI keys, JWTs, bearer tokens, and common email regexes before writing to disk.
Composable matchers (method, URL, model, messages, tools, temperature, ...) overridable per-cassette or globally.
pytest plugin: zero-config fixture auto-discovers tests/cassettes/<test_name>.yaml.
Local web UI with live updates as tests record (FastAPI, HTMX, SSE).
Strictly typed: mypy --strict clean on the public surface.

Install

pip install cuesheet               # SDK + CLI
pip install "cuesheet[web]"        # also installs the web UI
pip install "cuesheet[all]"        # everything

Python 3.10+.

Common patterns

Decorator (simplest)

@cuesheet.cassette("test_x.yaml")
def test_x():
    ...

Context manager

with cuesheet.cassette("my_run.yaml"):
    response = client.messages.create(...)

pytest fixture (zero-config)

def test_my_agent(cuesheet_cassette):
    # auto-uses tests/cassettes/test_my_agent.yaml
    ...

CI: forbid recording, fail on missing cassettes

@cuesheet.cassette("test_x.yaml", mode="replay_only")
def test_x():
    ...

Or globally:

CUESHEET_DEFAULT_MODE=replay_only pytest

This is the mode you want in CI. It guarantees no test ever silently records a new cassette against the real API on the build server.

Recording modes

Mode	Behavior	When to use
`record_new` (default)	Replay if cassette exists; record and save if missing	Local dev
`record_once`	Record only if file empty; never re-record	First-run fixtures
`record_always`	Always hit the real API; overwrite the cassette	Refresh after API changes
`replay_only`	Never call the network; fail if cassette missing	CI
`bypass`	Ignore cassette entirely	Disable in one place

Matchers

Two requests match if they're identical on:

HTTP method and URL
Model
Messages list (semantic, order-preserving)
Tools schema
Temperature, max_tokens, and other generation params

Override per cassette:

@cuesheet.cassette("x.yaml", match_on=["method", "url", "model", "messages"])
def test_x():
    ...

Or write a custom matcher:

@cuesheet.matcher
def ignore_user_id(req_a, req_b):
    a, b = req_a.body.copy(), req_b.body.copy()
    a.pop("user", None); b.pop("user", None)
    return a == b

Secret scrubbing

Cassettes get committed to your repo. Anything you didn't redact will end up on GitHub. cuesheet strips API keys, JWTs, and emails before write. Built-in patterns:

Anthropic keys (sk-ant-...)
OpenAI keys (sk-..., sk-proj-...)
Generic bearer tokens
JWTs (eyJ... triplets)
Common email regex

Add your own:

cuesheet.add_scrubber(r"INTERNAL-[A-Z0-9]{16}")

If you find a secret pattern that should be in the default set, please open a PR.

CLI

cuesheet list                              # all cassettes in cwd
cuesheet inspect tests/cassettes/x.yaml    # pretty-print one cassette
cuesheet stats                             # interaction + size totals
cuesheet scrub tests/cassettes/            # re-apply scrubbers in place
cuesheet web                               # open the local web UI

Web UI

cuesheet web                               # opens http://127.0.0.1:8095

Dark plus ochre, mobile-responsive, zero auth. The dashboard watches the filesystem and pushes change events over SSE, so the index and cassette detail pages update in real time as your tests run in another terminal. The pulsing live pill in the header confirms the watcher is connected. No daemon, no persistence; it just renders the files on disk.

Maturity

cuesheet is at v0.1.0. The public API (cuesheet.cassette, cuesheet.matcher, cuesheet.add_scrubber) is stable; internals may shift between 0.x minors. The interception logic hooks at the httpx transport layer, not at the SDK layer, so it's provider-agnostic by construction. Each SDK has quirks though, so if yours misbehaves, please file an issue with a minimal repro.

Supported providers

Any Python SDK that calls an LLM provider over httpx works. The providers below are explicitly detected and tagged in the web UI:

Anthropic
OpenAI
Google (Gemini)
Mistral
Cohere
Groq
DeepSeek
Together
LiteLLM (passes through to the underlying provider URL)

If your provider isn't in the list, cuesheet still records and replays it; you just won't get the coloured provider pill in the UI.

Comparison

	vcr.py	pytest-vcr	RESPX	cuesheet
HTTP-level	✅	✅	✅	✅
LLM-payload aware	❌	❌	❌	✅
Streaming response replay	⚠️ partial	⚠️ partial	❌	✅
Provider-agnostic	✅	✅	✅	✅
Auto API-key scrubbing	⚠️ manual	⚠️ manual	❌	✅
pytest plugin	⚠️ manual	✅	❌	✅
Web UI with live updates	❌	❌	❌	✅

License

MIT. Built by George Moustakas in Greece.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 21, 2026

0.1.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuesheet-0.2.0.tar.gz (55.8 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cuesheet-0.2.0-py3-none-any.whl (51.9 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file cuesheet-0.2.0.tar.gz.

File metadata

Download URL: cuesheet-0.2.0.tar.gz
Upload date: May 21, 2026
Size: 55.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cuesheet-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e1c9f93e650790235a34522d28bbc66637a16c8b05f2e0bc0884da469eec3815`
MD5	`9a29d4fa943fc7aa9d85819cf5aa873f`
BLAKE2b-256	`c249887cba2edb0e4cb9b2a16954cbe2f8898a3f1dfc2e1cc29a1efd8d57bc45`

See more details on using hashes here.

File details

Details for the file cuesheet-0.2.0-py3-none-any.whl.

File metadata

Download URL: cuesheet-0.2.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 51.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cuesheet-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9e05a79433dcb03ee34ebdc88a5ed457ff43fd532e8c5f8798fef04254517724`
MD5	`6f91a4f0d5c919c53a9a599ef67454ae`
BLAKE2b-256	`dd5b197ad5bf87c46454125688c64b098fe902dd46078723cca241fb78861ff7`

See more details on using hashes here.

cuesheet 0.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

cuesheet

The problem

What cuesheet does

Features

Install

Common patterns

Decorator (simplest)

Context manager

pytest fixture (zero-config)

CI: forbid recording, fail on missing cassettes

Recording modes

Matchers

Secret scrubbing

CLI

Web UI

Maturity

Supported providers

Comparison

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes