CLI + library to audit and benchmark RAG pipelines
Project description
rag-audit
CLI + library to audit and benchmark RAG pipelines. Detects hallucinations, measures retrieval quality, compares chunking strategies, and generates structured reports.
Installation
pip install rag-audit
Or with uv:
uv add rag-audit
Quickstart
1. Create a pipeline config file (pipeline.json):
{
"pipeline_id": "my-pipeline",
"question": "What is the capital of France?",
"answer": "Paris is the capital of France.",
"contexts": [
"Paris is the capital and largest city of France.",
"France is a country in Western Europe."
],
"relevant": [
"Paris is the capital and largest city of France."
],
"k": 2,
"llm": {
"provider": "openai",
"model": "gpt-4o-mini"
}
}
2. Run the audit:
export OPENAI_API_KEY=sk-...
rag-audit run pipeline.json -o result.json
3. Generate a report:
# Markdown (default)
rag-audit report result.json
# JSON
rag-audit report result.json --format json
Config reference
| Field | Type | Description |
|---|---|---|
pipeline_id |
string |
Identifier for the pipeline being audited |
question |
string |
The question posed to the RAG pipeline |
answer |
string |
The answer generated by the pipeline |
contexts |
string[] |
Retrieved chunks, in rank order |
relevant |
string[] |
Ground-truth relevant chunks (for retrieval metrics) |
k |
int |
Number of top chunks to evaluate (default: 5) |
llm.provider |
"openai" | "anthropic" |
LLM provider for the faithfulness judge |
llm.model |
string |
Model name (e.g. "gpt-4o-mini", "claude-3-5-haiku-20241022") |
Metrics
Retrieval
| Metric | Description |
|---|---|
| Precision@k | Fraction of the top-k retrieved chunks that are relevant |
| Recall@k | Fraction of all relevant chunks that appear in the top-k |
| MRR | Mean Reciprocal Rank — how high the first relevant chunk ranks |
Faithfulness
| Metric | Description |
|---|---|
| Score | 0.0–1.0 — how well the answer is grounded in the retrieved contexts |
| Verdict | FAITHFUL if score ≥ threshold (default 0.5), otherwise HALLUCINATION |
Python API
Audit a pipeline
from rag_audit.core.config import PipelineConfig, LLMConfig
from rag_audit.core.runner import AuditRunner
from rag_audit.report.renderer import ReportRenderer
config = PipelineConfig(
pipeline_id="my-pipeline",
question="What is the capital of France?",
answer="Paris is the capital of France.",
contexts=["Paris is the capital and largest city of France."],
relevant=["Paris is the capital and largest city of France."],
k=1,
llm=LLMConfig(provider="openai", model="gpt-4o-mini"),
)
report = AuditRunner(config).run()
print(ReportRenderer().to_markdown(report))
Compare chunking strategies
from langchain_openai import OpenAIEmbeddings
from rag_audit.chunker import ChunkingEvaluator, FixedSizeChunker, RecursiveChunker, SemanticChunker
embeddings = OpenAIEmbeddings()
evaluator = ChunkingEvaluator(embeddings)
report = evaluator.evaluate(
"Your long document text here...",
{
"fixed": FixedSizeChunker(chunk_size=500, overlap=50),
"recursive": RecursiveChunker(chunk_size=500),
"semantic": SemanticChunker(embeddings, similarity_threshold=0.8),
},
)
print(f"Best strategy: {report.best_strategy}")
for s in report.strategies:
print(f" {s.strategy}: avg_cohesion={s.avg_cohesion:.3f}, chunks={s.chunk_count}")
Use a vectorstore adapter
from rag_audit.adapters import ChromaDBAdapter
adapter = ChromaDBAdapter("my-collection")
adapter.add(ids=["doc1"], texts=["Paris is in France."], embeddings=[[...]])
results = adapter.query(embedding=[...], k=1)
Roadmap
- CLI (
rag-audit run,rag-audit report) - Hallucination detection (LLM-as-judge)
- Retrieval metrics (Precision@k, Recall@k, MRR)
- Structured audit reports (JSON + Markdown)
- Chunking strategy benchmark (fixed-size vs recursive vs semantic)
- Vectorstore adapters (ChromaDB — Pinecone and Qdrant coming soon)
- Documentation (GitHub Pages)
- PyPI release
Development
# Install dependencies
uv sync --group dev
# Run tests
uv run pytest
# Lint + format
uv run ruff check src/ tests/
uv run ruff format src/ tests/
# Type check
uv run mypy src/rag_audit
# Build docs locally
uv sync --group docs
uv run mkdocs serve
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_audit-0.1.1.tar.gz.
File metadata
- Download URL: rag_audit-0.1.1.tar.gz
- Upload date:
- Size: 252.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bbafd16cc8b783f3f8282d6d4ab9d16a827322167be70b1691f341db422afde
|
|
| MD5 |
34ba9bd0a9979ad1d2c4b4be7fecf579
|
|
| BLAKE2b-256 |
1ad015fa03cdd56db0a418999c6b1b61be679b1fab54ed49fe44862f614ce771
|
File details
Details for the file rag_audit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: rag_audit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b164fa070d0ef3aac0eb4826d55ff7b3c4fb51dc623e8ce0636186c399da098
|
|
| MD5 |
40059e9821f7a9ddf5c5744c332f4e99
|
|
| BLAKE2b-256 |
4614bf383f4fed08d68c23feaa3d91a6ff33763c69741c69ec650aba6ba6bf70
|