Skip to main content

CLI + library to audit and benchmark RAG pipelines

Project description

rag-audit

CLI + library to audit and benchmark RAG pipelines. Detects hallucinations, measures retrieval quality, compares chunking strategies, and generates structured reports.

Documentation

Installation

pip install rag-audit

Or with uv:

uv add rag-audit

Quickstart

1. Create a pipeline config file (pipeline.json):

{
  "pipeline_id": "my-pipeline",
  "question": "What is the capital of France?",
  "answer": "Paris is the capital of France.",
  "contexts": [
    "Paris is the capital and largest city of France.",
    "France is a country in Western Europe."
  ],
  "relevant": [
    "Paris is the capital and largest city of France."
  ],
  "k": 2,
  "llm": {
    "provider": "openai",
    "model": "gpt-4o-mini"
  }
}

2. Run the audit:

export OPENAI_API_KEY=sk-...
rag-audit run pipeline.json -o result.json

3. Generate a report:

# Markdown (default)
rag-audit report result.json

# JSON
rag-audit report result.json --format json

Config reference

Field Type Description
pipeline_id string Identifier for the pipeline being audited
question string The question posed to the RAG pipeline
answer string The answer generated by the pipeline
contexts string[] Retrieved chunks, in rank order
relevant string[] Ground-truth relevant chunks (for retrieval metrics)
k int Number of top chunks to evaluate (default: 5)
llm.provider "openai" | "anthropic" LLM provider for the faithfulness judge
llm.model string Model name (e.g. "gpt-4o-mini", "claude-3-5-haiku-20241022")

Metrics

Retrieval

Metric Description
Precision@k Fraction of the top-k retrieved chunks that are relevant
Recall@k Fraction of all relevant chunks that appear in the top-k
MRR Mean Reciprocal Rank — how high the first relevant chunk ranks

Faithfulness

Metric Description
Score 0.0–1.0 — how well the answer is grounded in the retrieved contexts
Verdict FAITHFUL if score ≥ threshold (default 0.5), otherwise HALLUCINATION

Python API

Audit a pipeline

from rag_audit.core.config import PipelineConfig, LLMConfig
from rag_audit.core.runner import AuditRunner
from rag_audit.report.renderer import ReportRenderer

config = PipelineConfig(
    pipeline_id="my-pipeline",
    question="What is the capital of France?",
    answer="Paris is the capital of France.",
    contexts=["Paris is the capital and largest city of France."],
    relevant=["Paris is the capital and largest city of France."],
    k=1,
    llm=LLMConfig(provider="openai", model="gpt-4o-mini"),
)

report = AuditRunner(config).run()
print(ReportRenderer().to_markdown(report))

Compare chunking strategies

from langchain_openai import OpenAIEmbeddings
from rag_audit.chunker import ChunkingEvaluator, FixedSizeChunker, RecursiveChunker, SemanticChunker

embeddings = OpenAIEmbeddings()
evaluator = ChunkingEvaluator(embeddings)

report = evaluator.evaluate(
    "Your long document text here...",
    {
        "fixed": FixedSizeChunker(chunk_size=500, overlap=50),
        "recursive": RecursiveChunker(chunk_size=500),
        "semantic": SemanticChunker(embeddings, similarity_threshold=0.8),
    },
)

print(f"Best strategy: {report.best_strategy}")
for s in report.strategies:
    print(f"  {s.strategy}: avg_cohesion={s.avg_cohesion:.3f}, chunks={s.chunk_count}")

Use a vectorstore adapter

from rag_audit.adapters import ChromaDBAdapter

adapter = ChromaDBAdapter("my-collection")
adapter.add(ids=["doc1"], texts=["Paris is in France."], embeddings=[[...]])
results = adapter.query(embedding=[...], k=1)

Roadmap

  • CLI (rag-audit run, rag-audit report)
  • Hallucination detection (LLM-as-judge)
  • Retrieval metrics (Precision@k, Recall@k, MRR)
  • Structured audit reports (JSON + Markdown)
  • Chunking strategy benchmark (fixed-size vs recursive vs semantic)
  • Vectorstore adapters (ChromaDB — Pinecone and Qdrant coming soon)
  • Documentation (GitHub Pages)
  • PyPI release

Development

# Install dependencies
uv sync --group dev

# Run tests
uv run pytest

# Lint + format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type check
uv run mypy src/rag_audit

# Build docs locally
uv sync --group docs
uv run mkdocs serve

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_audit-0.1.0.tar.gz (253.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_audit-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file rag_audit-0.1.0.tar.gz.

File metadata

  • Download URL: rag_audit-0.1.0.tar.gz
  • Upload date:
  • Size: 253.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for rag_audit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b61b0121e5f818dee5686ef6ae8be92f1913774435ddb0fe94e3d9fd9ca251bf
MD5 f78474bb89dd53fb70ee02b8e8b22cba
BLAKE2b-256 06f91f33049402f25ce7e3525f3076f7b5da37644cfd425f1d910dbc41bd3f16

See more details on using hashes here.

File details

Details for the file rag_audit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rag_audit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for rag_audit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 833de071227ce02588021503e596ec28cdb6aadb12a09f19af0abe433fb051ad
MD5 8c8528bbd2f6fc297696a3eeeb27f9cc
BLAKE2b-256 c815a7276dc2b40a2fe19cf117f497f7f68d9a2c62a574b89fb8b24d457ed2c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page