Domain-tuned retrieval + zero-LLM claim verification in one pipeline.
Project description
claimcheck
Domain-tuned retrieval + zero-LLM claim verification, in one pipeline.
claimcheck glues two siblings — adaptmem (domain-adapted bi-encoder retrieval) and halluguard (reverse-RAG hallucination detection) — into a single API:
from claimcheck import Pipeline
pipeline = Pipeline.from_corpus(
documents=["..."],
labelled_queries=[{"query": "...", "relevant_ids": [...]}],
train=True, # fine-tune the retriever on the labelled set
enable_nli=True, # add NLI verification on top of cosine retrieval
)
verdict = pipeline.check(
answer="The user prefers PostgreSQL because it has better JSON support.",
question="What database does the user prefer?",
)
print(verdict.trust_score) # 0.84
print(verdict.flagged_claims) # ["...because it has better JSON support"]
What it is
A thin orchestration layer over the two siblings:
- adaptmem trains a domain-adapted bi-encoder on your corpus + labelled queries.
- halluguard wraps the trained encoder in a
Guardwith NLI verification, surfaces a per-claim and per-response trust score.
The same Pipeline object can be saved + reloaded as a unit, so a downstream service has one model directory to manage.
Why one package
Adaptmem and halluguard are independently useful:
- adaptmem alone is a retrieval-quality lift (any domain).
- halluguard alone is a verification layer (any encoder).
But the most common deployment shape pairs them — domain-tuned retrieval for the cosine gate, claim-level NLI on top. claimcheck saves you the wiring.
What it is NOT
- Not a wrapper around any LLM. Both siblings are explicitly LLM-free.
- Not a vector database. Bring your own;
claimcheckis the encoder + verifier layer. - Not a replacement for either sibling. If you only need adaptmem (no verification) or only halluguard (with a generic encoder), use them directly.
Daemon mode (Pipeline.from_daemon)
For deployments where you'd rather not load a SentenceTransformer in
every Python process (claimcheck + halluguard + a third service each
paying the same model cost), point claimcheck at a long-lived
adaptmem serve
process:
from claimcheck import Pipeline
# Daemon must be running: `adaptmem serve --port 7800`
pipeline = Pipeline.from_daemon(
documents=[...],
daemon_url="http://127.0.0.1:7800",
enable_nli=True, # NLI verifier still runs in-process
)
verdict = pipeline.check("an answer", question="...")
The encoder hop crosses HTTP; cosine search and NLI verification stay
local. pipeline.save() is not supported for daemon-backed pipelines
(the model lives in the daemon). Pipeline.from_daemon calls
/healthz first so a misconfigured URL fails loudly at construction
time, not deep inside the first .check().
How it compares to LLM-as-judge tools
The closest commercial / open-source category is "LLM-as-judge" — a separate large-model call grades each claim. Claimcheck is the no-LLM-judge branch: a deterministic NLI cross-encoder + retrieval-augmented gate. The tradeoffs are real and shape what you should use it for.
| Feature | claimcheck | LLM-as-judge (Patronus, Galileo, CleanLab, Guardrails) |
|---|---|---|
| Judge model | NLI cross-encoder (≈90M params, local) | LLM call (GPT-4 / Claude / open-source 7-70B) |
| Cost per claim | $0 (local CPU/GPU) | $0.001-0.05 (API token cost) |
| Latency per claim (CPU) | 50-200ms | 500-3000ms (network + LLM inference) |
| Determinism | yes — same input → same score | partial — depends on model temperature, version, drift |
| Vendor lock-in | none | judge model API, often a single provider |
| Audit trail | claim → cited chunk → entail/contradict score | claim → judge prompt + judge response (opaque reasoning) |
| Domain tuning | yes — retriever fine-tuned on your corpus (adaptmem) | usually no — judge is generic |
| Customising the judge | swap any HuggingFace cross-encoder | retrain or fine-tune the LLM (rarely practical) |
| Streaming | yes — sentence-by-sentence verdict (check_stream) |
yes for some, but each judge call is heavier |
| Privacy | data stays local | claims and context sent to judge provider |
| Best at | budget-bound CI/middleware, per-domain accuracy, audit | general-purpose judgement, "did the model do something obviously bad" |
When claimcheck wins:
- High-throughput middleware where per-claim cost matters (every chatbot turn checked).
- Privacy-bound deployments (medical, legal, internal tools) where claims can't leave the perimeter.
- Domain-specific RAG where a tuned retriever beats a generic LLM judge that doesn't know your jargon.
- Streaming UX where users see the verdict as the LLM types.
When LLM-as-judge wins:
- Open-ended quality assessment ("is this answer helpful, safe, polite?") that isn't really a hallucination check.
- Few-shot domains with no labelled training queries to fine-tune the retriever.
- One-off audits where a $0.05 model call is cheaper than building infrastructure.
The two are complementary, not exclusive. A reasonable production stack runs claimcheck in-line on every response (cheap, deterministic, blocks the worst), and an LLM judge in a sampled audit (expensive, broader, catches subtler issues).
Status
v0.1.1 shipped. Public API decided (Pipeline.from_corpus, check, check_stream, check(profile=True), save/load), 6 unit tests passing, mypy --strict clean, CI matrix on 3.10/3.11/3.12. The two siblings (adaptmem v0.4-shipped, halluguard v0.2-ext-shipped) are mature enough to compose; this repo just wires them.
Pre-PyPI: install via local editable until siblings publish.
pip install -e ../adaptmem ../halluguard ../claimcheck
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claimcheck-0.1.0.tar.gz.
File metadata
- Download URL: claimcheck-0.1.0.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25b33b5d3a9ca879699c630b9aa38349de9e47f0f2e1b11f2ab096483d0edb88
|
|
| MD5 |
7b1689548d8930f888e8e90070bbca1b
|
|
| BLAKE2b-256 |
a8bc165d7b77f0b482a56ef389ab20ecf76f3382bce0ffc396eafa58ffbcaa1f
|
File details
Details for the file claimcheck-0.1.0-py3-none-any.whl.
File metadata
- Download URL: claimcheck-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
362a9adf129f68b2514981e631f0fa4a74b7f5e440f0cd4313b9daacaf6d9f9d
|
|
| MD5 |
764bde2f8ef8bf42ff5245ad23be01fd
|
|
| BLAKE2b-256 |
d0bc7bfa95c99e65577c355178a61a852b5ad1619eaa6e4c270fd332e7410952
|