Skip to main content

Test: record an agent run once, replay it forever — deterministic, offline, free. The vcrpy of the agent era.

Project description

cendor-cassette

Record an agent run once; replay it forever — deterministic, offline, and free. Unlike vcrpy (HTTP-only), it captures the whole run: every LLM call and tool call, in order.

Agent tests that run in 0.2s with no API key.

PyPI license · pip install cendor-cassette

from cendor.core import instrument
from cendor import cassette

client = instrument(OpenAI())          # the same instrumented seam used in production

@cassette.use("triage_happy_path.json")   # record first run, replay after (auto mode)
def test_triage():
    result = my_agent.run("My card was charged twice")
    assert "refund" in result.tools_called
    assert cassette.semantic_match(result.answer, "offers a refund")

Highlights

  • Whole-run capture — every LLM and tool call, in order (not just HTTP, like vcrpy).
  • Four modesauto (record then replay) · record · replay (fail on an unrecorded call) · rerecord (run live, report drift() without overwriting the committed cassette).
  • Decorator or context manager@cassette.use("run.json") / with cassette.using(...) (handy in pytest fixtures).
  • Meaning-based assertionssemantic_match(actual, expected) (offline lexical default; opt into a free offline local-embedding scorer, a BYO-provider embedder, or an LLM judge). semantic_drift() filters rerecord noise down to real regressions.
  • Pluggable matching + redaction — a normalizer ignores volatile fields; secrets/PII redacted on write, but matching hashes the un-redacted request so redaction never collapses two distinct calls (redact=True|False|callable).
  • Parallel-safe — recording is scoped to the active using()/use() context (a ContextVar), so concurrent blocks never capture each other's calls; cassettes are written atomically. Under pytest-xdist, give each worker its own cassette path (e.g. suffix with PYTEST_XDIST_WORKER) so workers don't race on one file.
  • Faithful replay — dict-response providers (Ollama/Bedrock) replay as dicts and SDK-object providers as attribute objects; stream=True and stream=False calls match their own recordings (cassette format v2; committed v1 cassettes still replay).
  • promote() turns a production JSONL trace into a replayable regression test (LLM and tool calls).

Semantic matching (opt-in)

semantic_match defaults to lexical_score — offline, deterministic, zero-dependency. For meaning-aware (negation-sensitive) checks, pass a scorer into the existing hook. cassette binds no model and adds no dependency unless you ask for one. Four tiers, hermetic-and-free → meaning-aware-but-costly:

  1. Lexical (default) — lexical_score. Hermetic, deterministic, free, zero-dep.
  2. Local embeddings (recommended) — local_embedding_scorer(), free/offline/deterministic via model2vec static embeddings (numpy-only, no torch, ~8–30 MB). Behind pip install 'cendor-cassette[embeddings]'.
  3. BYO provider embeddingsembedding_scorer(embed_fn) wraps any provider (OpenAI text-embedding-3-small/large, Google gemini-embedding, Cohere embed-v3; Anthropic has no embeddings API → use Voyage). Non-hermetic: a cloud embedder calls the network at score time. openai_embedding_scorer(client, model="text-embedding-3-small") is a thin convenience over an already-built OpenAI-shaped client.
  4. LLM-judge — a scorer that calls your own instrumented client (a documented recipe, never a shipped dependency). Non-hermetic, non-deterministic, costs money.
from cendor import cassette

score = cassette.local_embedding_scorer()                 # free, offline, deterministic
assert cassette.semantic_match(result.answer, "offers a refund", scorer=score)
assert not cassette.semantic_match("we will not offer a refund", "offers a refund", scorer=score)

drift() stays byte-exact; at temperature > 0 it flags every run. semantic_drift(threshold=0.8, scorer=None) re-scores each divergence's recorded-vs-live text and keeps only those below the threshold (real regressions, with a score), so cosmetic rewording is ignored. The alternative for byte-stable drift: record/replay at temperature=0.

Wrap-around, test-time only — records via core's bus, replays via a core interceptor; no second patch, no network.

See docs/cassette.md · CHANGELOG. Part of the Cendor stack — github.com/cendorhq/Cendor. Powered by PowerAI Labs. Apache-2.0; provided "as is", without warranty — use at your own risk (LICENSE §7–8).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cendor_cassette-1.0.0.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cendor_cassette-1.0.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file cendor_cassette-1.0.0.tar.gz.

File metadata

  • Download URL: cendor_cassette-1.0.0.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cendor_cassette-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4fc40ce51168614ee8ae85b8a84a247166f3b1872b9649991554c11cd8ec07bb
MD5 5e6c2f6fbbdb4a1317c4d1e2418a240f
BLAKE2b-256 977972a03198af86211d64b8390069d610058602c184af877c9b8effa2c70715

See more details on using hashes here.

Provenance

The following attestation bundles were made for cendor_cassette-1.0.0.tar.gz:

Publisher: release.yml on cendorhq/Cendor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cendor_cassette-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cendor_cassette-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d012ed1f02c2798722b0041500fea2049911650bef9fbda74fb7bd3962aef9ba
MD5 f66750ea845c08f49725f033e5a07283
BLAKE2b-256 01d8e5785d7adb64e270509357fa6a7106bded0dc18de85849973b3c5aeafc34

See more details on using hashes here.

Provenance

The following attestation bundles were made for cendor_cassette-1.0.0-py3-none-any.whl:

Publisher: release.yml on cendorhq/Cendor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page