Skip to main content

RAGLens CLI for debugging retrieval behavior in RAG systems

Project description

RAGLens

RAGLens is a CLI to debug retrieval behavior in RAG systems.

MVP loop:

  1. explain one bad query
  2. simulate many queries
  3. fix suggests the first change to try

Scope: retrieval diagnostics only.
Not answer grading, hallucination detection, prompt eval, or agent tracing.

Quick Start

# from repo root
cargo run -- explain inputs/docs --query "refund after 90 days"
cargo run -- simulate inputs/docs --queries inputs/queries.txt
cargo run -- fix inputs/docs --queries inputs/queries.txt

Use a richer sample corpus:

cargo run -- explain inputs/examples/ecommerce/docs --query "refund after 90 days"
cargo run -- simulate inputs/examples/ecommerce/docs --queries inputs/examples/ecommerce/queries.txt
cargo run -- fix inputs/examples/ecommerce/docs --queries inputs/examples/ecommerce/queries.txt

Install locally:

cargo install --path .
raglens --help

Install with pip (no Rust toolchain required once wheels are published):

pip install raglens-cli
raglens --help

Primary Commands

explain

Explain why top documents/chunks ranked for a single query.

raglens explain ./docs --query "refund after 90 days"

Outputs:

  • top-ranked chunks/docs
  • score breakdown (semantic + lexical components)
  • quick signal for why rank #1 won

Optional artifacts:

raglens explain ./docs --query "refund after 90 days" \
  --json-out artifacts/explain.json \
  --html-out artifacts/explain.html

simulate

Simulate retrieval over a query set.

raglens simulate ./docs --queries ./queries.txt

Outputs:

  • top-1 document frequency
  • low-similarity query count
  • no-match query count
  • dominant-document warning

fix

Rules-based diagnostic advisor.
It does not mutate files or auto-run agents.

raglens fix ./docs --queries ./queries.txt

Outputs:

  • detected issue
  • likely causes
  • first fix to try
  • rerun command

Example:

Issue: refund_policy.md dominates 48% of top-1 results

Likely causes:
- chunk size too large for mixed-topic content
- duplicate/repeated chunk language boosts one document

Try first: reduce chunk_size from 400 to 200
Then rerun: raglens simulate <docs> --queries queries.txt

diff

Compare two run artifacts and explain why behavior changed.

raglens diff --baseline run_a.json --current run_b.json

Optional machine output:

raglens diff --baseline run_a.json --current run_b.json --format json

Outputs:

  • whether answer changed
  • whether retrieval changed (added/removed/common docs)
  • likely root cause classification
  • confidence level

Example artifacts:

raglens diff \
  --baseline inputs/examples/run_diff/robocall_run_a.json \
  --current inputs/examples/run_diff/robocall_run_b.json

More scenarios (equivalent, contradictory, score-shift-only, more-specific): inputs/examples/run_diff/README.md

save-run

Save one run artifact JSON from your app outputs so it can be diffed later.

raglens save-run \
  --out artifacts/runs/2026-04-13T10-20-00_run.json \
  --question "Why did revenue increase?" \
  --answer "Revenue increased due to US growth" \
  --retrieved-docs artifacts/runs/retrieved_docs.json \
  --model gpt-4.1 \
  --top-k 5

--retrieved-docs accepts:

  • a JSON array of retrieved docs
  • or an object containing a retrieved_docs array

mcp-import

Convert an MCP/agent trace JSON into a valid run artifact for diff. This avoids hand-writing schema-matching JSON.

raglens mcp-import \
  --in ./trace.json \
  --out artifacts/runs/run_a.json

If your trace uses custom field paths, set JSON pointers:

raglens mcp-import \
  --in ./trace.json \
  --out artifacts/runs/run_a.json \
  --question-pointer /payload/q \
  --answer-pointer /payload/final \
  --docs-pointer /payload/ctx/hits

Then compare runs:

raglens diff --baseline artifacts/runs/run_a.json --current artifacts/runs/run_b.json

Inputs

Recommended MVP inputs:

  • docs: .md, .txt
  • queries: plain text, one query per line

Supported (advanced) query formats:

  • YAML with queries:
  • tab-separated: id<TAB>query<TAB>expect_doc1,expect_doc2
  • plain text query files can include blank lines and # comment lines (ignored)

Deterministic by Default

  • default embedder: local deterministic null embedder
  • deterministic chunking and ranking pipeline
  • consistent outputs for same corpus + queries + config

Artifacts

All commands support --json-out. explain also supports --html-out.

You can also use --artifacts-dir to write standard report files.

Real-World Use

Run on your own corpus:

raglens simulate ./docs --queries ./queries.txt --artifacts-dir ./artifacts
raglens fix ./docs --queries ./queries.txt

If you want a simple wrapper:

scripts/run-audit.sh ./docs ./queries.txt ./artifacts

Use real web docs as input (optional):

scripts/import-web-docs.sh ./inputs/public_urls.txt ./inputs/docs_web
cargo run -- simulate ./inputs/docs_web --queries ./inputs/queries.txt

Notes:

  • imported files are saved as plain .txt with a Source: header
  • imported pages that are mostly one long line are still split safely (sentence/token-based) during chunking
  • keep only pages you are allowed to store/use in your environment

Advanced / Experimental

RAGLens includes additional advanced commands for deeper workflows (comparison, optimization, etc.). They are intentionally hidden from default help to keep the MVP interface focused.

Experimental deterministic answer checker (CSV truth layer):

raglens answer-audit \
  --data ./inputs/examples/answer_audit/sales.csv \
  --group-by region,channel \
  --metric revenue \
  --period-col period \
  --baseline old \
  --current new \
  --question "Why did revenue increase?" \
  --answer "Revenue increased due to EU growth"

Unknown dataset quick start (auto infer schema):

raglens answer-audit \
  --data ./my_data.csv \
  --auto \
  --answer "Revenue increased because EU grew"

--auto infers:

  • metric column
  • period column
  • baseline/current period values
  • group-by columns

Optional period bucketing:

raglens answer-audit \
  --data ./my_data.csv \
  --auto \
  --period-granularity month \
  --answer "Revenue increased because EU grew"
  • --period-granularity raw|month|week (default raw)
  • month/week require parseable date-like period values

More answer-audit examples:

# expected verdict: SUPPORTED
raglens answer-audit \
  --data ./inputs/examples/answer_audit/sales_supported.csv \
  --group-by region,channel \
  --metric revenue \
  --period-col period \
  --baseline old \
  --current new \
  --answer "Revenue increased due to strong US Direct growth"

# expected verdict: RISKY (mentions weak contributor)
raglens answer-audit \
  --data ./inputs/examples/answer_audit/sales_risky.csv \
  --group-by region,channel \
  --metric revenue \
  --period-col period \
  --baseline old \
  --current new \
  --answer "Revenue increased due to US Direct and LATAM growth"

Non-Goals

  • Full RAG framework
  • Answer quality evaluator
  • Hallucination detector
  • Autonomous tuning agent

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

raglens_cli-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.0 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

raglens_cli-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

raglens_cli-0.1.3-py3-none-macosx_11_0_arm64.whl (6.5 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

raglens_cli-0.1.3-py3-none-macosx_10_12_x86_64.whl (6.8 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file raglens_cli-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for raglens_cli-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5d1560671715c78bfbbe96404324e90a4761977bb65e9c8852b7824128edd542
MD5 b8d8740327ed148a4cd35735afd4c9ec
BLAKE2b-256 2e0b98ca9a7fac4e0496182baf2b54cab5d7dfe3ebb5a93331e9b36c98b07f9a

See more details on using hashes here.

Provenance

The following attestation bundles were made for raglens_cli-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on kraftaa/raglens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file raglens_cli-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for raglens_cli-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 60f5441284a4a972e2aaf52a2fe8487eecadba857049bd8e3cd7b7d01d186ca4
MD5 98dd7929ef3d2b5b9873623700631ea7
BLAKE2b-256 f4c78bb40ad1b6d7f7958f32d34ba27b58e60f3d08ee6c6004e1147006c771c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for raglens_cli-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on kraftaa/raglens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file raglens_cli-0.1.3-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for raglens_cli-0.1.3-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ff7aaa45190393f261dc86480faa142a4d926e15bc196fd189b06b9a593c70ed
MD5 85a7aa22879f561387acab4043789aa7
BLAKE2b-256 13c2e9ab69535d7a01060f7c84109bf4f51f0073af8687646e249adc664933d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for raglens_cli-0.1.3-py3-none-macosx_11_0_arm64.whl:

Publisher: release.yml on kraftaa/raglens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file raglens_cli-0.1.3-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for raglens_cli-0.1.3-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a259070b5e4faf30144bf9d08e13523e7fa1aea0aeed0a30e0a95c22e766d308
MD5 905eeec5366f8b9f69578348b5267091
BLAKE2b-256 d492af4a5cc26b485add5158588ae6bcf74374ed9e3b97364a36f824a264efca

See more details on using hashes here.

Provenance

The following attestation bundles were made for raglens_cli-0.1.3-py3-none-macosx_10_12_x86_64.whl:

Publisher: release.yml on kraftaa/raglens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page