Document-grounded Q&A with sentence-level citations and faithfulness verification

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

firish

These details have not been verified by PyPI

Project description

verifiable-rag

Document-grounded Q&A with sentence-level citations, NLI verification, and calibrated refusal.

Status: pre-alpha · v0.5 launch sprint · interfaces are still subject to change

📚 Full documentation at firish.github.io/rag-rack — quickstart, concept guides, how-to recipes, API reference, benchmark reports.

What this is

A Python library for building RAG pipelines that:

Produce sentence-level citations — every generated sentence traces back to exact source spans (doc_id, page, char_start, char_end).
Verify every claim via NLI against its cited span before returning it.
Refuse when uncertain — calibrated abstention with a user-tunable strictness slider, not a "say I don't know" prompt.
Are fully auditable — inspect retrieval scores, reranker decisions, per-claim NLI results, and a self-contained HTML report per query.

One benchmark result that drives the design: on RAGTruth (the canonical 2,700-example RAG hallucination benchmark), a dual NLI ensemble of two small open-source models (HHEM-2.1-open + MiniCheck-Flan-T5-Large) matches Claude Sonnet 4.6 as a judge — AUROC 0.844 vs 0.846 — at ~250× lower per-call cost. Full result in benchmarks/PUBLISHED_ragtruth.md.

Quickstart

The bundled demo document ships with the package. No setup required beyond an LLM API key:

import verifiable_rag
from verifiable_rag.demo import sample_paper_path

answer = verifiable_rag.ask(
    "What is the mechanism of action of penicillin?",
    docs=sample_paper_path(),
)
print(answer.text)

export ANTHROPIC_API_KEY=...
python -c "import verifiable_rag; from verifiable_rag.demo import sample_paper_path; \
           print(verifiable_rag.ask('Who discovered penicillin?', docs=sample_paper_path()).text)"

For an actual production setup, point docs= at your own PDFs and pick a preset:

import verifiable_rag

answer = verifiable_rag.ask(
    "What did the authors find?",
    docs=["paper1.pdf", "paper2.pdf"],
    preset="hybrid_balanced",                  # RECOMMENDED — Cohere + Dual NLI + Haiku
    output_html="audit.html",                   # optional — write the HTML audit report
)
print(answer.text)

# Programmatic access to the audit trail:
for sentence in answer.unsupported_sentences:  # sentences the verifier flagged
    print(f"⚠ unsupported: {sentence.text}")

# Or emit a structured audit dump for logging / metrics:
metrics_client.emit(answer.audit_trail())

See examples/ for runnable demos covering the headline UX patterns. The full quickstart walks through each step in detail.

Presets

Five named presets cover most use cases. Switch via preset="..." or call the factories directly:

Preset	Components	Required keys	When to use
`local_minimal`	BGE + PyMuPDF + Haiku, no verifier	`ANTHROPIC_API_KEY`	Hobbyist / quickest start
`local_verified`	+ BGE rerank + HHEM NLI	`ANTHROPIC_API_KEY`	Local with verification
`hybrid_balanced`	Docling + Cohere + Dual NLI + constrained Haiku	`ANTHROPIC_API_KEY` + `COHERE_API_KEY`	Default — the published baseline
`hybrid_strict`	Same as balanced, refuse below faithfulness 0.7	same	Higher-trust use cases
`hybrid_paranoid`	Sonnet generator, refuse below faithfulness 0.9	same	Compliance / high-trust

For mix-and-match outside the presets, use verifiable_rag.build_pipeline(...) or load a YAML config (see examples/pipeline.yaml, Pipeline.from_yaml(), and the YAML config guide).

Architecture

PDF/DOCX → Parser → Document model → Chunker → Indexer
                                                  ↓
Answer ← Abstention ← Verifier ← Generator ← Retriever + Reranker

Every step preserves character-level spans. Every generated sentence carries (supporting_sentence_ids, confidence) linked to exact source locations. Citation granularity is decoupled from chunk granularity by design.

Audit trail

Every Answer exposes its full audit trail:

answer = verifiable_rag.ask(question, docs=...)

answer.text                      # final answer string
answer.sentences                 # list of CitedSentence with supporting_sentence_ids
answer.verification_results      # per-sentence NLI checks
answer.retrieved_chunks          # the reranked passages the generator saw

# Convenience accessors:
answer.supported_sentences       # list[CitedSentence] (passed verification)
answer.unsupported_sentences     # list[CitedSentence] (verifier flagged)
answer.verification_for(idx)     # VerificationResult | None for a sentence index
answer.cited_sentence_ids        # frozenset of all source IDs cited
answer.min_nli_score             # worst-case sentence — the bottleneck
answer.audit_trail()             # JSON-serializable dict for logging / metrics

# Or render the full audit as a self-contained HTML page:
answer.to_html()                  # returns HTML string
# or pass output_html="report.html" to verifiable_rag.ask()

The HTML report includes the query, the answer with per-sentence verification color coding, the faithfulness components, per-sentence NLI scores, and every reranked passage with its retrieval score — citations are anchored links into the passage list.

Installation

pip install verifiable-rag                          # core, no heavy deps
pip install "verifiable-rag[docling,bge,lancedb]"   # parser + embedder + index
pip install "verifiable-rag[hhem,minicheck]"        # NLI verifiers (adds torch + transformers)
pip install "verifiable-rag[litellm]"               # LLM-judge verifier
pip install "verifiable-rag[yaml]"                  # YAML config loader
pip install "verifiable-rag[all]"                   # everything

First-run model downloads

Verifier model weights are not bundled in the wheel — they're downloaded lazily from HuggingFace Hub on first use and cached forever in ~/.cache/huggingface/hub/.

Verifier	Model	Size
`HHEMVerifier`	`vectara/hallucination_evaluation_model`	~600 MB
`MiniCheckVerifier`	`lytang/MiniCheck-Flan-T5-Large`	~770 MB
`LLMJudgeVerifier`	(hosted API, no local model)	0

Published benchmark results

Benchmark	Headline	Report	Blog post
ALCE (Princeton citation quality)	Constrained decoding beats prompted by +4–7 F1 under dual-LLM-judge cross-validation	report	post
RAGTruth (hallucination detection)	Dual NLI ensemble = Sonnet judge at 1/250× the cost (AUROC 0.844 vs 0.846)	report	post
LitQA2 (biomedical scientific Q&A)	Constrained decoding lifts MC; contextual retrieval is a null result on saturated retrieval	report	post

Roadmap

Phase	Milestone	Status
0–1	Repo skeleton, data model, baseline pipeline	✅ done
2	Eval harness + BENCHMARKS.md	✅ done
3	Sentence-level citations (prompted / constrained / SAFE)	✅ done
4	Faithfulness verification + calibrated refusal (v0.4)	✅ done
5	Hardening, mkdocs docs, Gradio demo on HF Spaces (v0.5)	in progress
6	Launch — PyPI release + Show HN	pending

Contributing

See CLAUDE.md for architecture decisions, hard rules, and contribution conventions. Methodology critiques on the published benchmarks are especially welcome — eval rigor is the whole moat, and the only way to find the holes is to invite people to look for them.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

firish

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.2

May 31, 2026

0.5.1

May 31, 2026

This version

0.5.0

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verifiable_rag-0.5.0.tar.gz (146.7 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

verifiable_rag-0.5.0-py3-none-any.whl (177.0 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file verifiable_rag-0.5.0.tar.gz.

File metadata

Download URL: verifiable_rag-0.5.0.tar.gz
Upload date: May 31, 2026
Size: 146.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for verifiable_rag-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`3c0712d80341eb88233647cc351c22a10ed7cbf73ffa4256e6f8f6a245c97386`
MD5	`2c3f7779e9c62cc4c807d079b70e6bb2`
BLAKE2b-256	`94ddeb46fbf1b17a74fb9e2261d0e44241db4967523f40c424e3bf90d75d0810`

See more details on using hashes here.

Provenance

The following attestation bundles were made for verifiable_rag-0.5.0.tar.gz:

Publisher: publish.yml on firish/rag-rack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: verifiable_rag-0.5.0.tar.gz
- Subject digest: 3c0712d80341eb88233647cc351c22a10ed7cbf73ffa4256e6f8f6a245c97386
- Sigstore transparency entry: 1679516268
- Sigstore integration time: May 31, 2026
Source repository:
- Permalink: firish/rag-rack@6733def1a561e8ff7b776259a8fe1078ae3b33d1
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/firish
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6733def1a561e8ff7b776259a8fe1078ae3b33d1
- Trigger Event: push

File details

Details for the file verifiable_rag-0.5.0-py3-none-any.whl.

File metadata

Download URL: verifiable_rag-0.5.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 177.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for verifiable_rag-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27d5627579b325307222b32dd9fc8efea414f245cad6173354c0ba6bf1715660`
MD5	`46c726d72e3f58b832a5576da076a60c`
BLAKE2b-256	`a35bbec47d88c3dfb9f16290e94148c6198791ec268364d122b3029cf9b91f06`

See more details on using hashes here.

Provenance

The following attestation bundles were made for verifiable_rag-0.5.0-py3-none-any.whl:

Publisher: publish.yml on firish/rag-rack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: verifiable_rag-0.5.0-py3-none-any.whl
- Subject digest: 27d5627579b325307222b32dd9fc8efea414f245cad6173354c0ba6bf1715660
- Sigstore transparency entry: 1679516721
- Sigstore integration time: May 31, 2026
Source repository:
- Permalink: firish/rag-rack@6733def1a561e8ff7b776259a8fe1078ae3b33d1
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/firish
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6733def1a561e8ff7b776259a8fe1078ae3b33d1
- Trigger Event: push

verifiable-rag 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

verifiable-rag

What this is

Quickstart

Presets

Architecture

Audit trail

Installation

First-run model downloads

Published benchmark results

Roadmap

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance