Replay LLM API calls in tests. Zero cost. Zero flakes. Like vcr.py but for LLM SDKs.
Project description
The problem
If you've ever tried to write tests for code that calls an LLM, you know the drill.
- Hitting Anthropic or OpenAI in CI is slow (multi-second per call), flaky (rate limits, transient errors, sampling drift), and expensive (every PR run bleeds tokens).
- Hand-rolled mocks rot the moment the SDK ships a breaking change, and they never quite match what the real API returns.
- Existing HTTP fixtures (vcr.py, respx, pytest-vcr) work, but they don't understand LLM payloads, don't replay streamed responses faithfully, and don't scrub API keys for you.
So most teams settle for one of three bad options: skip the tests, mark them slow and skip them in CI, or write a brittle mock and pray.
What cuesheet does
cuesheet is a test-fixture library for any Python LLM SDK that uses httpx under the hood (Anthropic, OpenAI, Mistral, Gemini, Cohere, Groq, DeepSeek, Together, LiteLLM, and anything else built on the standard transport).
You wrap your test in @cuesheet.cassette(...). The first time it runs, cuesheet hits the real API and saves the request/response pair to a YAML file you commit to your repo. Every run after that replays from the file. Same response, byte-for-byte. No network calls. No flakes. No cost.
import cuesheet
@cuesheet.cassette("tests/cassettes/test_summarizer.yaml")
def test_summarizer():
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=200,
messages=[{"role": "user", "content": "Summarize: ..."}],
)
assert "key point" in response.content[0].text
That's the whole API. One decorator. One YAML file per test. Drop it in.
Features
- Sync and async clients, both supported.
- Streaming responses recorded as raw SSE chunks and replayed in order at configurable speed.
- One cassette can hold multiple interactions across multiple providers.
- YAML format chosen for git-friendly diffs during code review.
- Auto-scrubs Anthropic and OpenAI keys, JWTs, bearer tokens, and common email regexes before writing to disk.
- Composable matchers (method, URL, model, messages, tools, temperature, ...) overridable per-cassette or globally.
- pytest plugin: zero-config fixture auto-discovers
tests/cassettes/<test_name>.yaml. - Local web UI with live updates as tests record (FastAPI, HTMX, SSE).
- Strictly typed:
mypy --strictclean on the public surface.
Install
pip install cuesheet # SDK + CLI
pip install "cuesheet[web]" # also installs the web UI
pip install "cuesheet[all]" # everything
Python 3.10+.
Common patterns
Decorator (simplest)
@cuesheet.cassette("test_x.yaml")
def test_x():
...
Context manager
with cuesheet.cassette("my_run.yaml"):
response = client.messages.create(...)
pytest fixture (zero-config)
def test_my_agent(cuesheet_cassette):
# auto-uses tests/cassettes/test_my_agent.yaml
...
CI: forbid recording, fail on missing cassettes
@cuesheet.cassette("test_x.yaml", mode="replay_only")
def test_x():
...
Or globally:
CUESHEET_DEFAULT_MODE=replay_only pytest
This is the mode you want in CI. It guarantees no test ever silently records a new cassette against the real API on the build server.
Recording modes
| Mode | Behavior | When to use |
|---|---|---|
record_new (default) |
Replay if cassette exists; record and save if missing | Local dev |
record_once |
Record only if file empty; never re-record | First-run fixtures |
record_always |
Always hit the real API; overwrite the cassette | Refresh after API changes |
replay_only |
Never call the network; fail if cassette missing | CI |
bypass |
Ignore cassette entirely | Disable in one place |
Matchers
Two requests match if they're identical on:
- HTTP method and URL
- Model
- Messages list (semantic, order-preserving)
- Tools schema
- Temperature, max_tokens, and other generation params
Override per cassette:
@cuesheet.cassette("x.yaml", match_on=["method", "url", "model", "messages"])
def test_x():
...
Or write a custom matcher:
@cuesheet.matcher
def ignore_user_id(req_a, req_b):
a, b = req_a.body.copy(), req_b.body.copy()
a.pop("user", None); b.pop("user", None)
return a == b
Secret scrubbing
Cassettes get committed to your repo. Anything you didn't redact will end up on GitHub. cuesheet strips API keys, JWTs, and emails before write. Built-in patterns:
- Anthropic keys (
sk-ant-...) - OpenAI keys (
sk-...,sk-proj-...) - Generic bearer tokens
- JWTs (
eyJ...triplets) - Common email regex
Add your own:
cuesheet.add_scrubber(r"INTERNAL-[A-Z0-9]{16}")
If you find a secret pattern that should be in the default set, please open a PR.
CLI
cuesheet list # all cassettes in cwd
cuesheet inspect tests/cassettes/x.yaml # pretty-print one cassette
cuesheet stats # interaction + size totals
cuesheet scrub tests/cassettes/ # re-apply scrubbers in place
cuesheet web # open the local web UI
Web UI
cuesheet web # opens http://127.0.0.1:8095
Dark plus ochre, mobile-responsive, zero auth. The dashboard watches the filesystem and pushes change events over SSE, so the index and cassette detail pages update in real time as your tests run in another terminal. The pulsing live pill in the header confirms the watcher is connected. No daemon, no persistence; it just renders the files on disk.
Maturity
cuesheet is at v0.1.0. The public API (cuesheet.cassette, cuesheet.matcher, cuesheet.add_scrubber) is stable; internals may shift between 0.x minors. The interception logic hooks at the httpx transport layer, not at the SDK layer, so it's provider-agnostic by construction. Each SDK has quirks though, so if yours misbehaves, please file an issue with a minimal repro.
Supported providers
Any Python SDK that calls an LLM provider over httpx works. The providers below are explicitly detected and tagged in the web UI:
- Anthropic
- OpenAI
- Google (Gemini)
- Mistral
- Cohere
- Groq
- DeepSeek
- Together
- LiteLLM (passes through to the underlying provider URL)
If your provider isn't in the list, cuesheet still records and replays it; you just won't get the coloured provider pill in the UI.
Comparison
| vcr.py | pytest-vcr | RESPX | cuesheet | |
|---|---|---|---|---|
| HTTP-level | ✅ | ✅ | ✅ | ✅ |
| LLM-payload aware | ❌ | ❌ | ❌ | ✅ |
| Streaming response replay | ⚠️ partial | ⚠️ partial | ❌ | ✅ |
| Provider-agnostic | ✅ | ✅ | ✅ | ✅ |
| Auto API-key scrubbing | ⚠️ manual | ⚠️ manual | ❌ | ✅ |
| pytest plugin | ⚠️ manual | ✅ | ❌ | ✅ |
| Web UI with live updates | ❌ | ❌ | ❌ | ✅ |
License
MIT. Built by George Moustakas in Greece.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cuesheet-0.1.0.tar.gz.
File metadata
- Download URL: cuesheet-0.1.0.tar.gz
- Upload date:
- Size: 46.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be23486938e8cad6f9761e7f31f3b89284ed4825fe012d2d2a6d494ce9374768
|
|
| MD5 |
b1f82159d455c88683135f634d8ed4ee
|
|
| BLAKE2b-256 |
15f37ee6f97ce2a1bd2db0f57d69aaa1514967b728c1e0f8a1768de8195164b5
|
File details
Details for the file cuesheet-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cuesheet-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94d694e1034e0479efef18c33a456bfea1d50fdf40df0b1f0e1ed7ae97ce8100
|
|
| MD5 |
b1834663ed9e24aae9779a16ca5113f4
|
|
| BLAKE2b-256 |
c01c2d1a2fa53bc7c70e4fc0139b70dbae993ae4f1e8108e2d46c76303b87db8
|