Skip to main content

CLI + library to audit and benchmark RAG pipelines

Project description

rag-audit

CLI + library to audit and benchmark RAG pipelines. Detects hallucinations, measures retrieval quality, compares chunking strategies, and generates structured reports.

Documentation

Installation

pip install rag-audit

Or with uv:

uv add rag-audit

Quickstart

1. Create a pipeline config file (pipeline.json):

{
  "pipeline_id": "my-pipeline",
  "question": "What is the capital of France?",
  "answer": "Paris is the capital of France.",
  "contexts": [
    "Paris is the capital and largest city of France.",
    "France is a country in Western Europe."
  ],
  "relevant": [
    "Paris is the capital and largest city of France."
  ],
  "k": 2,
  "llm": {
    "provider": "openai",
    "model": "gpt-4o-mini"
  }
}

2. Run the audit:

export OPENAI_API_KEY=sk-...
rag-audit run pipeline.json -o result.json

3. Generate a report:

# Markdown (default)
rag-audit report result.json

# JSON
rag-audit report result.json --format json

Config reference

Field Type Description
pipeline_id string Identifier for the pipeline being audited
question string The question posed to the RAG pipeline
answer string The answer generated by the pipeline
contexts string[] Retrieved chunks, in rank order
relevant string[] Ground-truth relevant chunks (for retrieval metrics)
k int Number of top chunks to evaluate (default: 5)
llm.provider "openai" | "anthropic" LLM provider for the faithfulness judge
llm.model string Model name (e.g. "gpt-4o-mini", "claude-3-5-haiku-20241022")

Metrics

Retrieval

Metric Description
Precision@k Fraction of the top-k retrieved chunks that are relevant
Recall@k Fraction of all relevant chunks that appear in the top-k
MRR Mean Reciprocal Rank — how high the first relevant chunk ranks

Faithfulness

Metric Description
Score 0.0–1.0 — how well the answer is grounded in the retrieved contexts
Verdict FAITHFUL if score ≥ threshold (default 0.5), otherwise HALLUCINATION

Python API

Audit a pipeline

from rag_audit.core.config import PipelineConfig, LLMConfig
from rag_audit.core.runner import AuditRunner
from rag_audit.report.renderer import ReportRenderer

config = PipelineConfig(
    pipeline_id="my-pipeline",
    question="What is the capital of France?",
    answer="Paris is the capital of France.",
    contexts=["Paris is the capital and largest city of France."],
    relevant=["Paris is the capital and largest city of France."],
    k=1,
    llm=LLMConfig(provider="openai", model="gpt-4o-mini"),
)

report = AuditRunner(config).run()
print(ReportRenderer().to_markdown(report))

Compare chunking strategies

from langchain_openai import OpenAIEmbeddings
from rag_audit.chunker import ChunkingEvaluator, FixedSizeChunker, RecursiveChunker, SemanticChunker

embeddings = OpenAIEmbeddings()
evaluator = ChunkingEvaluator(embeddings)

report = evaluator.evaluate(
    "Your long document text here...",
    {
        "fixed": FixedSizeChunker(chunk_size=500, overlap=50),
        "recursive": RecursiveChunker(chunk_size=500),
        "semantic": SemanticChunker(embeddings, similarity_threshold=0.8),
    },
)

print(f"Best strategy: {report.best_strategy}")
for s in report.strategies:
    print(f"  {s.strategy}: avg_cohesion={s.avg_cohesion:.3f}, chunks={s.chunk_count}")

Use a vectorstore adapter

from rag_audit.adapters import ChromaDBAdapter

adapter = ChromaDBAdapter("my-collection")
adapter.add(ids=["doc1"], texts=["Paris is in France."], embeddings=[[...]])
results = adapter.query(embedding=[...], k=1)

Roadmap

  • CLI (rag-audit run, rag-audit report)
  • Hallucination detection (LLM-as-judge)
  • Retrieval metrics (Precision@k, Recall@k, MRR)
  • Structured audit reports (JSON + Markdown)
  • Chunking strategy benchmark (fixed-size vs recursive vs semantic)
  • Vectorstore adapters (ChromaDB — Pinecone and Qdrant coming soon)
  • Documentation (GitHub Pages)
  • PyPI release

Development

# Install dependencies
uv sync --group dev

# Run tests
uv run pytest

# Lint + format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type check
uv run mypy src/rag_audit

# Build docs locally
uv sync --group docs
uv run mkdocs serve

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_audit-0.1.1.tar.gz (252.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_audit-0.1.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file rag_audit-0.1.1.tar.gz.

File metadata

  • Download URL: rag_audit-0.1.1.tar.gz
  • Upload date:
  • Size: 252.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for rag_audit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0bbafd16cc8b783f3f8282d6d4ab9d16a827322167be70b1691f341db422afde
MD5 34ba9bd0a9979ad1d2c4b4be7fecf579
BLAKE2b-256 1ad015fa03cdd56db0a418999c6b1b61be679b1fab54ed49fe44862f614ce771

See more details on using hashes here.

File details

Details for the file rag_audit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: rag_audit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for rag_audit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b164fa070d0ef3aac0eb4826d55ff7b3c4fb51dc623e8ce0636186c399da098
MD5 40059e9821f7a9ddf5c5744c332f4e99
BLAKE2b-256 4614bf383f4fed08d68c23feaa3d91a6ff33763c69741c69ec650aba6ba6bf70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page