Local-first SDK and CLI for RAG and agent reliability tracing, citation checks, and failure diagnosis.

These details have not been verified by PyPI

Project links

Project description

ContextTrace

Debug RAG failures before users find them.

ContextTrace is a local-first Python SDK and CLI for evaluating existing RAG and AI agent systems. It records retrieved chunks, selected context, answer claims, citations, token usage, latency, and agent events, then writes local traces and HTML reports without requiring a hosted dashboard.

Install

pip install contexttrace
contexttrace --version
contexttrace init

Optional integrations:

pip install "contexttrace[langchain]"
pip install "contexttrace[llamaindex]"
pip install "contexttrace[fastapi]"
pip install "contexttrace[langgraph]"
pip install "contexttrace[otel]"
pip install "contexttrace[all]"

Quickstart

contexttrace init
contexttrace demo --dataset refund_policy
contexttrace report --last
contexttrace doctor

By default, traces are stored locally in:

.contexttrace/contexttrace.db

SDK Example

from contexttrace import ContextTrace

ct = ContextTrace(project="support-rag")

with ct.trace(query="What is the refund policy?") as trace:
    chunks = retriever.search("What is the refund policy?")
    trace.log_retrieval(chunks)
    trace.log_context(chunks[:5])

    answer = llm.generate("What is the refund policy?", chunks[:5])
    trace.log_answer(answer, usage={"total_tokens": 1200})
    trace.log_citations([
        {"claim": "Refunds are available within 30 days.", "source_chunk_id": "chunk_12"}
    ])

    result = trace.evaluate()
    print(result["failure"]["failure_type"])

BYO RAG Endpoint

Capture and verify one live response from a running local or hosted RAG API without adding SDK code:

contexttrace capture endpoint \
  --endpoint http://localhost:8000/query \
  --query "What is the refund policy?" \
  --answer-path $.answer \
  --contexts-path $.contexts \
  --citations-path $.citations \
  --out traces/refund_trace.json \
  --verify \
  --report

If you already have a saved endpoint response:

contexttrace capture response response.json \
  --query "What is the refund policy?" \
  --out traces/refund_trace.json \
  --verify \
  --report

Evaluate a dataset through the same endpoint when you are ready to regression test:

contexttrace eval \
  --dataset evals/questions.json \
  --endpoint http://localhost:8000/query \
  --method POST \
  --input-key question \
  --answer-path $.answer \
  --contexts-path $.contexts \
  --citations-path $.citations \
  --fail-on "failure_rate>0.25"

Claim-Level Evidence Verification

Verify a portable RAG trace artifact without a hosted dashboard:

contexttrace verify-demo unsupported_claim --report
contexttrace inspect trace.json
contexttrace qa trace.json --corpus docs/ --report
contexttrace verify trace.json
contexttrace verify trace.json --json
contexttrace verify trace.json --report --out reports/example.html
contexttrace verify trace.json --mode semantic
contexttrace verify trace.json --fail-on unsupported --fail-on citation_mismatch
contexttrace verify-benchmark --mode semantic
contexttrace verify-benchmark --mode semantic --report
contexttrace verify-benchmark --case-set external --mode semantic --report
contexttrace compare baseline.json current.json
contexttrace compare baseline.json current.json --report
contexttrace compare baseline.json current.json --fail-on new_failure
contexttrace suite create traces/*.json --out contexttrace-suite.json
contexttrace suite add contexttrace-suite.json traces/new_failure.json
contexttrace suite list contexttrace-suite.json
contexttrace suite run contexttrace-suite.json --endpoint http://localhost:8000/query --report
contexttrace suite prune contexttrace-suite.json --results .contexttrace/suites/contexttrace-regression-suite_results.json
contexttrace suite report .contexttrace/suites/contexttrace-regression-suite_results.json
contexttrace audit trace.json --corpus docs/
contexttrace audit trace.json --corpus docs/ --report
contexttrace audit trace.json --corpus docs/ --fail-on retrieval_miss
contexttrace audit-benchmark --case-set real --mode semantic
contexttrace audit-benchmark --case-set real --mode semantic --report

Input requires query, answer, and contexts with id and text. Optional citations are checked to catch cited sources that do not actually support the matched claim.

verify-demo uses bundled demo traces, so it works immediately after pip install contexttrace. Available demos include unsupported_claim, partial_support, citation_mismatch, should_abstain, and supported_answer.

Use --mode semantic for local paraphrase-aware matching, and verify-benchmark to inspect bundled precision/recall metrics. The default benchmark includes 32 ContextTrace docs and release-artifact cases. --case-set external adds public OSS documentation and GitHub issue cases from Qdrant, Chroma, Haystack, and LangChain, while --case-set all runs both packs. --report writes an HTML report with misses to inspect.

Verification output includes evidence span offsets, stable span hashes, multiple supporting spans, typed matched/missing facts, and claim-level root causes so partial support failures are easier to inspect.

ContextTrace verifies whether each generated claim is actually supported by retrieved evidence. Instead of only showing a trace or a score, it tells you where the evidence chain broke: unsupported claim, citation mismatch, retrieval miss, answer overreach, conflicting context, or should-have-abstained.

Use the capture helper when you have RAG artifacts in memory:

from contexttrace import capture_rag_trace, write_rag_trace

trace = capture_rag_trace(query=question, answer=answer, contexts=retrieved_docs)
write_rag_trace(trace, "trace.json")

Use contexttrace compare baseline.json current.json to diff two portable traces or saved verify --json outputs. It reports support-rate deltas, new unsupported claims, citation regressions, should-abstain flips, and new root causes, with --fail-on gates for CI.

Use contexttrace suite create, suite add, and suite run to turn saved failures into replayable endpoint tests. Suite runs call your current RAG endpoint with the saved query, verify the new answer, compare it with the baseline trace, and exit non-zero when a saved failure still reproduces or a good case regresses. Use suite list, suite remove, and suite prune to manage the suite as failures are fixed or retired.

Use contexttrace audit trace.json --corpus docs/ to diagnose whether an unsupported claim failed because retrieval missed evidence, reranking buried it, chunking omitted the supporting span, the corpus lacks coverage, or generation overclaimed. Audit output includes failure stages, diagnostic signals, and prioritized next actions.

Use contexttrace audit-benchmark --case-set real --mode semantic to test retrieval-audit labels against bundled public OSS documentation and GitHub issue snippets from Qdrant, Chroma, Haystack, LangChain, and ContextTrace docs.

The v0.7.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.

What It Catches

retrieval_miss
citation_mismatch
unsupported_answer
contradicted_answer
conflicting_sources
should_have_abstained
agent failures such as stale_memory_used and tool_error

Privacy

Local mode is the default. ContextTrace makes no network calls unless you configure an LLM judge provider or evaluate a RAG endpoint you provide.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.0

Jun 5, 2026

0.8.0

Jun 5, 2026

This version

0.7.0

Jun 5, 2026

0.6.0

Jun 4, 2026

0.5.0

Jun 4, 2026

0.4.0

Jun 4, 2026

0.3.0

Jun 4, 2026

0.2.0

Jun 3, 2026

0.1.0

Jun 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contexttrace-0.7.0.tar.gz (135.3 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contexttrace-0.7.0-py3-none-any.whl (160.0 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file contexttrace-0.7.0.tar.gz.

File metadata

Download URL: contexttrace-0.7.0.tar.gz
Upload date: Jun 5, 2026
Size: 135.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for contexttrace-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`e4d74fe90ae4fb3389c6f6cd9ad93f42b86fb992833b5f7e359bcad75783e0b5`
MD5	`040b91a0ce92fd5695e2b2cbdec2b24d`
BLAKE2b-256	`08d7e0e50d4343a956e4f763382d7e66ec208144daecbc897d6a2a62cc3bf528`

See more details on using hashes here.

File details

Details for the file contexttrace-0.7.0-py3-none-any.whl.

File metadata

Download URL: contexttrace-0.7.0-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 160.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for contexttrace-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80d3082451475f328e06e1974d625465bca5e5c826c8ba60242795076cc7abd5`
MD5	`0ed7e1aa757c2f9617ef073e56fe3153`
BLAKE2b-256	`ef5713b84036051cabf8d3077fff127c964d0fab8dabb2845efac515dd786795`

See more details on using hashes here.

contexttrace 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContextTrace

Install

Quickstart

SDK Example

BYO RAG Endpoint

Claim-Level Evidence Verification

What It Catches

Privacy

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes