CLI + library to audit and benchmark RAG pipelines

These details have not been verified by PyPI

Project links

Project description

rag-audit

CLI + library to audit and benchmark RAG pipelines. Detects hallucinations, measures retrieval quality, compares chunking strategies, and generates structured reports.

Documentation

Installation

pip install rag-audit

Or with uv:

uv add rag-audit

Quickstart

1. Create a pipeline config file (pipeline.json):

{
  "pipeline_id": "my-pipeline",
  "question": "What is the capital of France?",
  "answer": "Paris is the capital of France.",
  "contexts": [
    "Paris is the capital and largest city of France.",
    "France is a country in Western Europe."
  ],
  "relevant": [
    "Paris is the capital and largest city of France."
  ],
  "k": 2,
  "llm": {
    "provider": "openai",
    "model": "gpt-4o-mini"
  }
}

2. Run the audit:

export OPENAI_API_KEY=sk-...
rag-audit run pipeline.json -o result.json

3. Generate a report:

# Markdown (default)
rag-audit report result.json

# JSON
rag-audit report result.json --format json

Config reference

Field	Type	Description
`pipeline_id`	`string`	Identifier for the pipeline being audited
`question`	`string`	The question posed to the RAG pipeline
`answer`	`string`	The answer generated by the pipeline
`contexts`	`string[]`	Retrieved chunks, in rank order
`relevant`	`string[]`	Ground-truth relevant chunks (for retrieval metrics)
`k`	`int`	Number of top chunks to evaluate (default: `5`)
`llm.provider`	`"openai"` \| `"anthropic"`	LLM provider for the faithfulness judge
`llm.model`	`string`	Model name (e.g. `"gpt-4o-mini"`, `"claude-3-5-haiku-20241022"`)

Metrics

Retrieval

Metric	Description
Precision@k	Fraction of the top-k retrieved chunks that are relevant
Recall@k	Fraction of all relevant chunks that appear in the top-k
MRR	Mean Reciprocal Rank — how high the first relevant chunk ranks

Faithfulness

Metric	Description
Score	0.0–1.0 — how well the answer is grounded in the retrieved contexts
Verdict	`FAITHFUL` if score ≥ threshold (default `0.5`), otherwise `HALLUCINATION`

Python API

Audit a pipeline

from rag_audit.core.config import PipelineConfig, LLMConfig
from rag_audit.core.runner import AuditRunner
from rag_audit.report.renderer import ReportRenderer

config = PipelineConfig(
    pipeline_id="my-pipeline",
    question="What is the capital of France?",
    answer="Paris is the capital of France.",
    contexts=["Paris is the capital and largest city of France."],
    relevant=["Paris is the capital and largest city of France."],
    k=1,
    llm=LLMConfig(provider="openai", model="gpt-4o-mini"),
)

report = AuditRunner(config).run()
print(ReportRenderer().to_markdown(report))

Compare chunking strategies

from langchain_openai import OpenAIEmbeddings
from rag_audit.chunker import ChunkingEvaluator, FixedSizeChunker, RecursiveChunker, SemanticChunker

embeddings = OpenAIEmbeddings()
evaluator = ChunkingEvaluator(embeddings)

report = evaluator.evaluate(
    "Your long document text here...",
    {
        "fixed": FixedSizeChunker(chunk_size=500, overlap=50),
        "recursive": RecursiveChunker(chunk_size=500),
        "semantic": SemanticChunker(embeddings, similarity_threshold=0.8),
    },
)

print(f"Best strategy: {report.best_strategy}")
for s in report.strategies:
    print(f"  {s.strategy}: avg_cohesion={s.avg_cohesion:.3f}, chunks={s.chunk_count}")

Use a vectorstore adapter

from rag_audit.adapters import ChromaDBAdapter

adapter = ChromaDBAdapter("my-collection")
adapter.add(ids=["doc1"], texts=["Paris is in France."], embeddings=[[...]])
results = adapter.query(embedding=[...], k=1)

Roadmap

CLI (rag-audit run, rag-audit report)
Hallucination detection (LLM-as-judge)
Retrieval metrics (Precision@k, Recall@k, MRR)
Structured audit reports (JSON + Markdown)
Chunking strategy benchmark (fixed-size vs recursive vs semantic)
Vectorstore adapters (ChromaDB — Pinecone and Qdrant coming soon)
Documentation (GitHub Pages)
PyPI release

Development

# Install dependencies
uv sync --group dev

# Run tests
uv run pytest

# Lint + format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type check
uv run mypy src/rag_audit

# Build docs locally
uv sync --group docs
uv run mkdocs serve

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 5, 2026

0.1.0

Jun 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_audit-0.1.1.tar.gz (252.9 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_audit-0.1.1-py3-none-any.whl (15.5 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file rag_audit-0.1.1.tar.gz.

File metadata

Download URL: rag_audit-0.1.1.tar.gz
Upload date: Jun 5, 2026
Size: 252.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for rag_audit-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`0bbafd16cc8b783f3f8282d6d4ab9d16a827322167be70b1691f341db422afde`
MD5	`34ba9bd0a9979ad1d2c4b4be7fecf579`
BLAKE2b-256	`1ad015fa03cdd56db0a418999c6b1b61be679b1fab54ed49fe44862f614ce771`

See more details on using hashes here.

File details

Details for the file rag_audit-0.1.1-py3-none-any.whl.

File metadata

Download URL: rag_audit-0.1.1-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for rag_audit-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6b164fa070d0ef3aac0eb4826d55ff7b3c4fb51dc623e8ce0636186c399da098`
MD5	`40059e9821f7a9ddf5c5744c332f4e99`
BLAKE2b-256	`4614bf383f4fed08d68c23feaa3d91a6ff33763c69741c69ec650aba6ba6bf70`

See more details on using hashes here.

rag-audit 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rag-audit

Installation

Quickstart

Config reference

Metrics

Retrieval

Faithfulness

Python API

Audit a pipeline

Compare chunking strategies

Use a vectorstore adapter

Roadmap

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes