Skip to main content

Vector lineage tracking for RAG pipelines.

Project description

VecTrace

Wrong RAG answer? Trace it to the exact vector, chunk, and source document.

vectrace is a CLI to debug a single wrong RAG answer by tracing it back to its exact source. Use it when one answer is wrong and you need to find the exact document and chunk that caused it.

Tagline:

  • vectrace — trace where your RAG answers come from

Start Here

If you only try one thing, run this:

vectrace ask-trace \
  --db ./vectrace.db \
  --collection support_kb \
  --question "Can I get a refund after 90 days?" \
  --final-answer "Yes, refunds are allowed." \
  --top-k 3 \
  --output ./ask-trace.html \
  --json-output ./ask-trace.json

This runs retrieval, links results to source data, and generates a trace report.

This gives you:

  • ask-trace.html (shareable report)
  • ask-trace.json (machine-readable payload)

Output Preview

Ask Trace Report

What You Get

VecTrace links these layers in one place:

  • Retrieval context: question, answer, rank, score, metadata
  • Retrieval mode: exact (from your retriever telemetry) or bootstrap (best-effort lexical fallback)
  • Vector provenance: vector ID, embedding model, run info
  • Source evidence: chunk ID/index/snippet and source document path/version

Incident Example

Scenario:

  • Question: Can I get a refund after 90 days?
  • Answer: Yes, refunds are allowed.
  • Policy evidence: Refunds are only allowed within 30 days for eligible defects.

Run:

vectrace ask-trace \
  --db ./vectrace.db \
  --collection support_kb \
  --question "Can I get a refund after 90 days?" \
  --final-answer "Yes, refunds are allowed." \
  --top-k 3 \
  --output ./incident-refund.html \
  --json-output ./incident-refund.json

Check in output:

  • evidence.support_status (expected: unsupported)
  • evidence.support_reason (shows why)
  • evidence.chunk_text + source_path for exact source proof

Core Commands

  • vectrace ask-trace ... for first-time debugging with no manual vector IDs.
  • vectrace trace ... when you already have a vector ID.
  • vectrace report ... when you want HTML from a known vector ID.

Quickstart

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install setuptools wheel
python3 -m pip install -e . --no-build-isolation

vectrace init --db ./vectrace.db
vectrace onboard --db ./vectrace.db --output ./trace-demo.html
vectrace ask-trace \
  --db ./vectrace.db \
  --collection support_kb \
  --question "Can I get a refund after 90 days?" \
  --top-k 3 \
  --output ./ask-trace.html \
  --json-output ./ask-trace.json

Advanced / Integration

Use these when integrating with your existing retriever/serving pipeline.

Auto-record from a Qdrant search (TrackedQdrant.search_with_tracking)

Skip the CLI for real RAG requests — wrap your Qdrant client and each search call records its own retrieval events:

from connectors.qdrant import TrackedQdrant

with TrackedQdrant(qdrant_url="http://localhost:6333", db_path="./vectrace.db") as rag:
    hits, query_id = rag.search_with_tracking(
        collection_name="support_kb",
        query_text="Can I get a refund after 90 days?",
        query_vector=embed("Can I get a refund after 90 days?"),
        limit=3,
        final_answer="Yes, refunds are allowed.",  # optional; can attach later
    )
    # ... pass `hits` to your LLM as usual

# Then debug the answer the same way you would with the CLI:
#   vectrace report-qa --db ./vectrace.db --collection support_kb \
#     --question "Can I get a refund after 90 days?" --output ./trace.html

Lineage writes are best-effort: if SQLite fails, the search still returns and the failure is logged to stderr. The returned query_id lets you correlate later commands or attach a final answer to the same retrieval event group.

Record real retrieval telemetry (record-retrieval)

To attach exact retriever rank/score/vector IDs from your app:

vectrace record-retrieval \
  --db ./vectrace.db \
  --collection support_kb \
  --vector-id vec_101 \
  --query-text "Can I get a refund after 90 days?" \
  --final-answer "Yes, refunds are allowed." \
  --rank 1 \
  --score 0.87 \
  --evidence-text "Refunds are only allowed within 30 days for eligible defects." \
  --metadata-json '{"request_id":"req-123","session_id":"s-1"}'

Bootstrap from question+answer (record-qa)

If retriever telemetry was not logged, create best-effort retrieval events from stored chunk previews:

vectrace record-qa \
  --db ./vectrace.db \
  --collection support_kb \
  --question "Can I get a refund after 90 days?" \
  --final-answer "Yes, refunds are allowed." \
  --top-k 3

Query recorded events (trace-qa, report-qa)

vectrace trace-qa --db ./vectrace.db --question "Can I get a refund after 90 days?" --answer "Yes, refunds are allowed." --collection support_kb --format json
vectrace report-qa --db ./vectrace.db --question "Can I get a refund after 90 days?" --answer "Yes, refunds are allowed." --collection support_kb --output ./qa-trace.html --json-output ./qa-trace.json

Command Reference

Core

  • vectrace ask-trace --db ./vectrace.db --collection <name> --question "<query>" [--final-answer "<answer>"] [--top-k 3] [--match-index 1] --output ask-trace.html [--json-output ask-trace.json] [--redact-preview] [--redact-retrieval] [--metadata-json '{"k":"v"}'] [--format text|json]
  • vectrace trace --db ./vectrace.db --vector-id <id> [--collection <name>] [--format text|json] [--plain] [--redact-preview] [--redact-retrieval] [--include-retrieval]
  • vectrace report --db ./vectrace.db --vector-id <id> [--collection <name>] --output trace.html [--redact-preview] [--redact-retrieval] [--include-retrieval]

Advanced

  • vectrace init --db ./vectrace.db
  • vectrace onboard --db ./vectrace.db --output trace-demo.html
  • vectrace seed-demo --db ./vectrace.db --collection support_kb --vectors 200 --docs 20
  • vectrace connect --qdrant-url http://localhost:6333 --qdrant-collection support_kb
  • vectrace record-retrieval --db ./vectrace.db --collection <name> --vector-id <id> --query-text "<query>" [--final-answer "<answer>"] [--rank <n>] [--score <s>] [--evidence-text "<snippet>"] [--metadata-json '{"k":"v"}']
  • vectrace record-qa --db ./vectrace.db --collection <name> --question "<query>" --final-answer "<answer>" [--top-k 3] [--metadata-json '{"k":"v"}']
  • vectrace trace-qa --db ./vectrace.db --question "<query>" [--answer "<answer>"] [--collection <name>] [--top-k 3] [--format text|json] [--redact-preview] [--redact-retrieval]
  • vectrace report-qa --db ./vectrace.db --question "<query>" [--answer "<answer>"] [--collection <name>] [--top-k 5] [--match-index 1] --output qa-trace.html [--json-output qa-trace.json] [--redact-preview] [--redact-retrieval]

Output Shape (JSON)

trace --format json --include-retrieval and trace-qa --format json include:

  • retrieval: question, answer, rank, score, metadata
  • retrieval.trace_mode: exact or bootstrap
  • trace/lineage: vector/chunk/document chain
  • evidence: explicit snippet fields (chunk_text, source_path, etc.)

Privacy note:

  • --redact-retrieval redacts retrieval question/answer fields (including common metadata duplicates) and hides assessment text/details that could echo the original query.

ask-trace --format json emits selected match + query_id in one payload.

Architecture

flowchart LR
    A["RAG Ingest Pipeline"] --> B["VecTrace Tracker"]
    B --> C["SQLite Trace DB"]
    A --> D["Qdrant (vector values)"]
    E["vectrace ask-trace/trace/report/trace-qa/report-qa"] --> C
    E --> F["JSON/Text Output"]
    E --> G["HTML Report"]

Qdrant Integration

Install optional dependency:

python3 -m pip install qdrant-client

Use connectors.qdrant.TrackedQdrant to upsert vectors and record trace metadata in one step.

Demo Assets

  • Report screenshot: docs/assets/report-screenshot.svg
  • JSON sample: docs/examples/ask-trace-sample.json
  • Terminal tape: demo/vectrace-demo.tape
  • Terminal GIF (generate):
brew install vhs
./scripts/make_terminal_demo_gif.sh

Release / CI

  • CI: .github/workflows/ci.yml
  • Release publish: .github/workflows/release.yml
  • PyPI checklist: docs/PYPI_RELEASE.md
  • Build script: scripts/build_dist.sh

Development Tests

python3 -m unittest discover -s tests -v
.venv/bin/python -m unittest discover -s tests -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectrace-0.1.4.tar.gz (44.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectrace-0.1.4-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file vectrace-0.1.4.tar.gz.

File metadata

  • Download URL: vectrace-0.1.4.tar.gz
  • Upload date:
  • Size: 44.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vectrace-0.1.4.tar.gz
Algorithm Hash digest
SHA256 69fcca1fd074a1fd5b053244f6cea9f44b54873aefe1e6efef7e14cb25b0caea
MD5 5ca8c11fae3007d897d6acc1342fdf89
BLAKE2b-256 9794df3286a1d5ae041779670416d6d520deb0340c274e0348721cdae9957340

See more details on using hashes here.

Provenance

The following attestation bundles were made for vectrace-0.1.4.tar.gz:

Publisher: release.yml on kraftaa/vectrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vectrace-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: vectrace-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vectrace-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 34ef57c62fe2c7c36981aac02322315355d2fc981f44173dbdd776b47a9cdba9
MD5 d330216d23caae50b3be618986663065
BLAKE2b-256 fad8a7be794e7d1b7355960cfc703977b82f5ff3169e4d1b2b9c65099f1813d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for vectrace-0.1.4-py3-none-any.whl:

Publisher: release.yml on kraftaa/vectrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page