Reverse-RAG hallucination detector. No LLM judge. Vendor-neutral.
Project description
halluguard
Reverse RAG hallucination detector. No LLM judge. Vendor-neutral. Real-time.
The problem
Every LLM hallucinates. The standard mitigation is RAG: retrieve relevant documents, feed them as context, ask the LLM to answer "based on context." This reduces hallucination but does not detect when one slips through. After generation, you don't know which sentences are supported by the context and which the model invented.
The current detection options all have downsides:
- LLM-as-judge — recursive, slow, expensive, and the judge can hallucinate too.
- Constrained generation — fragile, breaks on any prompt change.
- Post-hoc fact-checking — manual, batch, doesn't scale.
- Vendor-locked tools — tied to a specific provider's safety stack.
The idea
Reverse RAG. Standard RAG goes query → retrieve → answer. Halluguard goes answer → retrieve → flag.
After the LLM produces an answer:
- Split the answer into atomic claims (sentence-level by default, semantic-unit option later).
- For each claim, retrieve the top-k chunks from the source corpus that could support it (same encoder you use for normal RAG).
- Score the alignment between the claim and each retrieved chunk (cosine + lightweight cross-encoder verification).
- Below threshold →
HALLUCINATION_FLAG. Above threshold →SUPPORTEDwith citation IDs.
No LLM is invoked at any stage. The retriever is the same model your RAG pipeline already runs.
Why this is new
Most "hallucination detection" libraries either:
- Run an LLM-as-judge (expensive recursion, judge itself hallucinates), or
- Compute embedding similarity of the whole answer to the whole context (loses sentence-level granularity), or
- Require fine-tuning a custom classifier per domain.
Halluguard works at sentence granularity, uses the retrieval stack you already have, and adds zero LLM cost per check.
Architecture (planned)
halluguard/
__init__.py
segment.py # split answer into atomic claims
retriever.py # bi-encoder retrieval over corpus chunks
verifier.py # claim ↔ chunk alignment scoring
guard.py # high-level Guard.check(answer, context) -> Report
report.py # SupportReport dataclass + to_html / to_markdown
cli.py # halluguard check answer.txt --corpus docs/
benchmarks/
truthfulqa_eval.py # measure flag precision/recall on known datasets
ragtruth_eval.py
tests/
test_segment.py
test_retriever.py
test_verifier.py
test_end_to_end.py
README.md
LICENSE
pyproject.toml
.github/workflows/
High-level API
from halluguard import Guard
from halluguard.verifier import NLIVerifier
from sentence_transformers import SentenceTransformer
# Build the index once. With NLI verifier on, claims pass only when both
# the cosine gate and the entailment gate clear.
guard = Guard.from_documents(
documents=["..."],
encoder=SentenceTransformer("all-MiniLM-L6-v2"),
verifier=NLIVerifier(), # optional, lifts recall meaningfully
threshold=0.55, # cosine gate
entail_threshold=0.5, # NLI gate
min_entail_votes=1, # raise to 2 for "agreement of two" policy
)
# `question` is optional but recommended when the answer was generated
# in response to a known question — it shapes the NLI premise.
report = guard.check(
answer="The user prefers PostgreSQL because it has better JSON support.",
question="What database does the user prefer?",
)
for claim in report.claims:
print(claim.text, claim.status, claim.support_score, claim.vote_str)
# "The user prefers PostgreSQL" SUPPORTED 0.91 3/5
# "...because it has better JSON support" HALLUCINATION_FLAG 0.34 0/5
Knobs that matter
threshold(cosine gate, default 0.55): lower = more permissive, higher = stricter. HaluEval QA optimum landed near 0.70 in our sweeps.entail_threshold(NLI gate, default 0.5): independent of cosine. Tightening it improves precision when paraphrased hallucinations sneak past cosine.min_entail_votes(default 1): how many of the top-K chunks must clearentail_thresholdbefore a claim is SUPPORTED.1= legacy max-only policy. Set to2or3for multi-evidence agreement — trades recall for precision. v0.3 ablation pending.question=onGuard.check(): when supplied, the NLI premise becomes"Question: {q}\nContext: {chunk}". Useful when answers were generated to answer a specific question; without it, NLI can fail to align answer-from-context patterns.
Daemon mode (Guard.from_daemon)
For deployments where you don't want each Python process to load its own
SentenceTransformer (claimcheck + halluguard + a third service would
otherwise each pay the same MiniLM cost), point Guard at a long-lived
adaptmem serve
process:
from halluguard import Guard
from halluguard.verifier import NLIVerifier
# Daemon must be running: `adaptmem serve --port 7800`
guard = Guard.from_daemon(
documents=["..."],
daemon_url="http://127.0.0.1:7800",
verifier=NLIVerifier(),
)
report = guard.check("an answer", question="...")
Guard.from_daemon calls /healthz first so a misconfigured URL fails
loudly at construction time. The retriever and NLI verifier still run
locally; only the encoder hop crosses HTTP. For a Unix-socket daemon
(zero TCP overhead), pass daemon_url="http+unix:///tmp/adaptmem.sock".
CLI
# Single-file corpus, one-shot check
halluguard answer.txt --corpus-file knowledge.txt --nli --question "What did the user say?"
# Directory of .txt files as corpus, max-only policy (legacy)
halluguard answer.txt --corpus ./docs/ --nli
# Stricter: require 2-of-5 chunks to entail before SUPPORTED
halluguard answer.txt --corpus ./docs/ --nli --min-votes 2 --entail-threshold 0.6
# Inline corpus string, no directory needed
halluguard answer.txt --corpus-text "PostgreSQL has strong JSON support and ACID guarantees." --nli
The CLI exits 0 if no claim was flagged, 1 if any claim was flagged — easy to chain into a CI step that gates on hallucination rate.
What halluguard is NOT
- Not an LLM proxy. It does not generate or rewrite text.
- Not a fact-checker against the open web. The "ground truth" is your corpus.
- Not a guarantee. Some hallucinations align lexically with unrelated chunks; halluguard reduces but does not eliminate false negatives. The threshold is tunable per use case.
Benchmarks
Synthetic suite (benchmarks/synthetic_eval.py)
6 examples, 18 claims, 7 hallucinations covering polarity flips ("after render" → "before render"), false attribution (LSM-trees in Postgres), and unsupported additions (pet llamas on Mars).
| Pipeline | Precision | Recall | F1 |
|---|---|---|---|
| Bi-encoder only (cosine threshold) | 0.80 | 0.57 | 0.667 |
| Bi-encoder + NLI verifier | 0.78 | 1.00 | 0.875 |
Synthetic data is curated to be diagnostic — F1=0.875 is a "the pipeline isn't broken" signal, not a generalisation claim.
Real benchmark — HaluEval QA (benchmarks/ragtruth_eval.py)
We tried the public RAGTruth mirrors first (wandb/RAGTruth, flagrant/RAGTruth, ParticleMedia/RAGTruth, TIGER-Lab/RAGTruth) — none are accessible on HF Hub. Fell back to pminervini/HaluEval (qa config, 10 000 items). Each item is expanded into two cases: (knowledge, right_answer) → gold-truthful, (knowledge, hallucinated_answer) → gold-hallucinated. Prediction is "hallucinated" iff any claim in the answer is flagged. Item-level evaluation (HaluEval QA has no per-span labels — RAGTruth would, but it's locked).
| Pipeline | n cases | Threshold | Precision | Recall | F1 |
|---|---|---|---|---|---|
| Bi-encoder only | 200 | 0.70 | 0.451 | 0.790 | 0.575 |
| Bi-encoder + NLI | 200 | 0.70 | 0.490 | 0.960 | 0.649 |
| Bi-encoder + NLI + question-aware premise | 100 | 0.70 | 0.489 | 0.920 | 0.639 |
| Same, but bi-encoder swapped to adaptmem FT-300 (domain-tuned) | 100 | 0.70 | 0.489 | 0.920 | 0.639 |
Latest run is benchmarks/results_ragtruth_q_v1.json — same Guard, but the NLI premise is now f"Question: {q}\nContext: {chunk}" instead of chunk alone. F1 is within noise of the older 200-case run (0.639 vs 0.649; ±1 case ≈ ±1pt on a 100-case sample). Recall held above 0.90; precision did not move much. Question-aware framing did not collapse the false-positive ceiling on its own — the bi-encoder still surfaces enough lexically-similar but semantically-irrelevant chunks that NLI rules them in.
Encoder ablation (results_ragtruth_ft300_v1.json): the hypothesis was that swapping the generic all-MiniLM-L6-v2 bi-encoder for adaptmem's domain-tuned FT-300 model would lift precision (lexically-similar-but-irrelevant chunks score lower → fewer FPs at the cosine gate). On HaluEval QA the result was identical at the best threshold (0.489 / 0.920 / 0.639). Probably because HaluEval QA gives a single 1-chunk corpus per case; the tuned encoder's disambiguation advantage requires larger corpora. Re-test on RAGTruth public mirror (when accessible) for a fair multi-chunk evaluation.
Vote ablation (results_ragtruth_vote_v1.json): swept min_entail_votes ∈ {1, 2, 3} at threshold 0.70 expecting precision to rise as recall fell. Instead min_votes ≥ 2 collapsed the Guard into a flag-everything classifier (TP=50, FP=50, FN=0, TN=0 → F1=0.667). The cause is the same single-chunk-per-case corpus structure: max achievable entail_votes is 1, so requiring 2+ flags every claim. The apparent F1 lift is the trivial all-positive baseline on a balanced set, not a genuine precision improvement. The policy is structurally unmeasurable on HaluEval QA and needs a multi-chunk benchmark to evaluate honestly.
The honest takeaway: halluguard catches almost every hallucination on a real benchmark, but at a high false-positive rate. For real-world deployment you want it as a flag-for-review layer, not an auto-reject gate. The new min_entail_votes knob (require N of top-K to entail, not just one) is the lever for trading recall down for precision — that ablation isn't measured here yet.
Default NLI model: cross-encoder/nli-deberta-v3-base (~440MB, lazy-loaded). Replaceable with any HuggingFace cross-encoder NLI checkpoint.
Run yourself:
pip install -e ".[bench]"
# Apple silicon: pass --device cpu to bypass the MPS deadlock that
# silently exits long NLI sweeps.
python benchmarks/ragtruth_eval.py --n-examples 50 --nli --device cpu
Larger benchmarks (full RAGTruth once a public mirror appears, FActScore) coming next.
Status
v0.2 — bi-encoder + NLI verifier wired up, synthetic bench passing, real HaluEval QA bench wired in with honest numbers. RAGTruth (when reachable) and CI workflows next.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file halluguard-0.1.0.tar.gz.
File metadata
- Download URL: halluguard-0.1.0.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14607c4d916a61cddaa83845b01cc4f52171ade1cbb9b246efa83418364a113b
|
|
| MD5 |
95032dd0310df7be9bbadc2d31593e08
|
|
| BLAKE2b-256 |
b936c828fac94d2730d2e762859457aab93f81cb0c3fc3fd1127c183d575d588
|
File details
Details for the file halluguard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: halluguard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9bb7da0a723e7d36c9012ddc88164520ef99b433df71d22c504ebb189273f41
|
|
| MD5 |
63535fab2f5baf263dcfbc3df4532761
|
|
| BLAKE2b-256 |
b24c266ae3793617981fc2d369686160f57c645b2ab36e5bb90e6fcc5523b11a
|