Skip to main content

Local-first SDK and CLI for RAG and agent reliability tracing, citation checks, and failure diagnosis.

Project description

ContextTrace

Debug RAG failures before users find them.

ContextTrace is a local-first Python SDK and CLI for evaluating existing RAG and AI agent systems. It records retrieved chunks, selected context, answer claims, citations, token usage, latency, and agent events, then writes local traces and HTML reports without requiring a hosted dashboard.

Install

pip install contexttrace
contexttrace --version
contexttrace init

Optional integrations:

pip install "contexttrace[langchain]"
pip install "contexttrace[llamaindex]"
pip install "contexttrace[fastapi]"
pip install "contexttrace[langgraph]"
pip install "contexttrace[otel]"
pip install "contexttrace[all]"

Quickstart

contexttrace init
contexttrace demo --dataset refund_policy
contexttrace report --last
contexttrace doctor

By default, traces are stored locally in:

.contexttrace/contexttrace.db

SDK Example

from contexttrace import ContextTrace

ct = ContextTrace(project="support-rag")

with ct.trace(query="What is the refund policy?") as trace:
    chunks = retriever.search("What is the refund policy?")
    trace.log_retrieval(chunks)
    trace.log_context(chunks[:5])

    answer = llm.generate("What is the refund policy?", chunks[:5])
    trace.log_answer(answer, usage={"total_tokens": 1200})
    trace.log_citations([
        {"claim": "Refunds are available within 30 days.", "source_chunk_id": "chunk_12"}
    ])

    result = trace.evaluate()
    print(result["failure"]["failure_type"])

BYO RAG Endpoint

Evaluate a running local or hosted RAG API without adding SDK code:

contexttrace eval \
  --dataset evals/questions.json \
  --endpoint http://localhost:8000/query \
  --method POST \
  --input-key question \
  --answer-path $.answer \
  --contexts-path $.contexts \
  --citations-path $.citations \
  --fail-on "failure_rate>0.25"

Claim-Level Evidence Verification

Verify a portable RAG trace artifact without a hosted dashboard:

contexttrace verify-demo unsupported_claim --report
contexttrace verify trace.json
contexttrace verify trace.json --json
contexttrace verify trace.json --report --out reports/example.html
contexttrace verify trace.json --mode semantic
contexttrace verify trace.json --fail-on unsupported --fail-on citation_mismatch
contexttrace verify-benchmark --mode semantic
contexttrace verify-benchmark --mode semantic --report
contexttrace verify-benchmark --case-set external --mode semantic --report
contexttrace compare baseline.json current.json
contexttrace compare baseline.json current.json --report
contexttrace compare baseline.json current.json --fail-on new_failure

Input requires query, answer, and contexts with id and text. Optional citations are checked to catch cited sources that do not actually support the matched claim.

verify-demo uses bundled demo traces, so it works immediately after pip install contexttrace. Available demos include unsupported_claim, partial_support, citation_mismatch, should_abstain, and supported_answer.

Use --mode semantic for local paraphrase-aware matching, and verify-benchmark to inspect bundled precision/recall metrics. The default benchmark includes 32 real ContextTrace docs and release-artifact cases. --case-set external adds public OSS documentation and GitHub issue cases from Qdrant, Chroma, Haystack, and LangChain, while --case-set all runs both packs. --report writes an HTML report with misses to inspect.

Verification output includes evidence span offsets, stable span hashes, multiple supporting spans, typed matched/missing facts, and claim-level root causes so partial support failures are easier to inspect.

ContextTrace verifies whether each generated claim is actually supported by retrieved evidence. Instead of only showing a trace or a score, it tells you where the evidence chain broke: unsupported claim, citation mismatch, retrieval miss, answer overreach, conflicting context, or should-have-abstained.

Use contexttrace compare baseline.json current.json to diff two portable traces or saved verify --json outputs. It reports support-rate deltas, new unsupported claims, citation regressions, should-abstain flips, and new root causes, with --fail-on gates for CI.

The v0.4.0 verifier uses local lexical heuristics by default. Claim extraction is rule-based, contradiction detection is conservative, and semantic or LLM-judge support can be added later.

What It Catches

  • retrieval_miss
  • citation_mismatch
  • unsupported_answer
  • contradicted_answer
  • conflicting_sources
  • should_have_abstained
  • agent failures such as stale_memory_used and tool_error

Privacy

Local mode is the default. ContextTrace makes no network calls unless you configure an LLM judge provider or evaluate a RAG endpoint you provide.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contexttrace-0.4.0.tar.gz (96.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contexttrace-0.4.0-py3-none-any.whl (115.6 kB view details)

Uploaded Python 3

File details

Details for the file contexttrace-0.4.0.tar.gz.

File metadata

  • Download URL: contexttrace-0.4.0.tar.gz
  • Upload date:
  • Size: 96.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for contexttrace-0.4.0.tar.gz
Algorithm Hash digest
SHA256 4953e4c46b1b3931626439283090f0b91a4ee5f12d738d2875f304802e8f2a18
MD5 025f305d879ed63ead541fd2bc307bb8
BLAKE2b-256 11bb60931f0559b20415d8b8eadebb02233ec00d0f2150b7a3afcd99c6b29adb

See more details on using hashes here.

File details

Details for the file contexttrace-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: contexttrace-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 115.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for contexttrace-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a646f8580b13e54d80facf39366b7e543e2e32ed1609d1eac04d57889b001f8
MD5 d7c23481263345b5f5e999b3fd5aef01
BLAKE2b-256 417f3238f2edc18c9008801afb59d5278230ae1668c5adb6c3149bee5295c9e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page