Domain-tuned retrieval + zero-LLM claim verification in one pipeline.

These details have not been verified by PyPI

Project description

claimcheck

Domain-tuned retrieval + zero-LLM claim verification, in one pipeline.

claimcheck glues two siblings — adaptmem (domain-adapted bi-encoder retrieval) and halluguard (reverse-RAG hallucination detection) — into a single API:

from claimcheck import Pipeline

pipeline = Pipeline.from_corpus(
    documents=["..."],
    labelled_queries=[{"query": "...", "relevant_ids": [...]}],
    train=True,        # fine-tune the retriever on the labelled set
    enable_nli=True,   # add NLI verification on top of cosine retrieval
)

verdict = pipeline.check(
    answer="The user prefers PostgreSQL because it has better JSON support.",
    question="What database does the user prefer?",
)

print(verdict.trust_score)         # 0.84
print(verdict.flagged_claims)       # ["...because it has better JSON support"]

What it is

A thin orchestration layer over the two siblings:

adaptmem trains a domain-adapted bi-encoder on your corpus + labelled queries.
halluguard wraps the trained encoder in a Guard with NLI verification, surfaces a per-claim and per-response trust score.

The same Pipeline object can be saved + reloaded as a unit, so a downstream service has one model directory to manage.

Why one package

Adaptmem and halluguard are independently useful:

adaptmem alone is a retrieval-quality lift (any domain).
halluguard alone is a verification layer (any encoder).

But the most common deployment shape pairs them — domain-tuned retrieval for the cosine gate, claim-level NLI on top. claimcheck saves you the wiring.

What it is NOT

Not a wrapper around any LLM. Both siblings are explicitly LLM-free.
Not a vector database. Bring your own; claimcheck is the encoder + verifier layer.
Not a replacement for either sibling. If you only need adaptmem (no verification) or only halluguard (with a generic encoder), use them directly.

Daemon mode (`Pipeline.from_daemon`)

For deployments where you'd rather not load a SentenceTransformer in every Python process (claimcheck + halluguard + a third service each paying the same model cost), point claimcheck at a long-lived adaptmem serve process:

from claimcheck import Pipeline

# Daemon must be running: `adaptmem serve --port 7800`
pipeline = Pipeline.from_daemon(
    documents=[...],
    daemon_url="http://127.0.0.1:7800",
    enable_nli=True,   # NLI verifier still runs in-process
)
verdict = pipeline.check("an answer", question="...")

The encoder hop crosses HTTP; cosine search and NLI verification stay local. pipeline.save() is not supported for daemon-backed pipelines (the model lives in the daemon). Pipeline.from_daemon calls /healthz first so a misconfigured URL fails loudly at construction time, not deep inside the first .check().

How it compares to LLM-as-judge tools

The closest commercial / open-source category is "LLM-as-judge" — a separate large-model call grades each claim. Claimcheck is the no-LLM-judge branch: a deterministic NLI cross-encoder + retrieval-augmented gate. The tradeoffs are real and shape what you should use it for.

Feature	claimcheck	LLM-as-judge (Patronus, Galileo, CleanLab, Guardrails)
Judge model	NLI cross-encoder (≈90M params, local)	LLM call (GPT-4 / Claude / open-source 7-70B)
Cost per claim	$0 (local CPU/GPU)	$0.001-0.05 (API token cost)
Latency per claim (CPU)	50-200ms	500-3000ms (network + LLM inference)
Determinism	yes — same input → same score	partial — depends on model temperature, version, drift
Vendor lock-in	none	judge model API, often a single provider
Audit trail	claim → cited chunk → entail/contradict score	claim → judge prompt + judge response (opaque reasoning)
Domain tuning	yes — retriever fine-tuned on your corpus (adaptmem)	usually no — judge is generic
Customising the judge	swap any HuggingFace cross-encoder	retrain or fine-tune the LLM (rarely practical)
Streaming	yes — sentence-by-sentence verdict (`check_stream`)	yes for some, but each judge call is heavier
Privacy	data stays local	claims and context sent to judge provider
Best at	budget-bound CI/middleware, per-domain accuracy, audit	general-purpose judgement, "did the model do something obviously bad"

When claimcheck wins:

High-throughput middleware where per-claim cost matters (every chatbot turn checked).
Privacy-bound deployments (medical, legal, internal tools) where claims can't leave the perimeter.
Domain-specific RAG where a tuned retriever beats a generic LLM judge that doesn't know your jargon.
Streaming UX where users see the verdict as the LLM types.

When LLM-as-judge wins:

Open-ended quality assessment ("is this answer helpful, safe, polite?") that isn't really a hallucination check.
Few-shot domains with no labelled training queries to fine-tune the retriever.
One-off audits where a $0.05 model call is cheaper than building infrastructure.

The two are complementary, not exclusive. A reasonable production stack runs claimcheck in-line on every response (cheap, deterministic, blocks the worst), and an LLM judge in a sampled audit (expensive, broader, catches subtler issues).

Status

v0.1.1 shipped. Public API decided (Pipeline.from_corpus, check, check_stream, check(profile=True), save/load), 6 unit tests passing, mypy --strict clean, CI matrix on 3.10/3.11/3.12. The two siblings (adaptmem v0.4-shipped, halluguard v0.2-ext-shipped) are mature enough to compose; this repo just wires them.

Pre-PyPI: install via local editable until siblings publish.

pip install -e ../adaptmem ../halluguard ../claimcheck

License

MIT.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claimcheck-0.1.0.tar.gz (11.3 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

claimcheck-0.1.0-py3-none-any.whl (9.2 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file claimcheck-0.1.0.tar.gz.

File metadata

Download URL: claimcheck-0.1.0.tar.gz
Upload date: Apr 27, 2026
Size: 11.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for claimcheck-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`25b33b5d3a9ca879699c630b9aa38349de9e47f0f2e1b11f2ab096483d0edb88`
MD5	`7b1689548d8930f888e8e90070bbca1b`
BLAKE2b-256	`a8bc165d7b77f0b482a56ef389ab20ecf76f3382bce0ffc396eafa58ffbcaa1f`

See more details on using hashes here.

File details

Details for the file claimcheck-0.1.0-py3-none-any.whl.

File metadata

Download URL: claimcheck-0.1.0-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 9.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for claimcheck-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`362a9adf129f68b2514981e631f0fa4a74b7f5e440f0cd4313b9daacaf6d9f9d`
MD5	`764bde2f8ef8bf42ff5245ad23be01fd`
BLAKE2b-256	`d0bc7bfa95c99e65577c355178a61a852b5ad1619eaa6e4c270fd332e7410952`

See more details on using hashes here.

claimcheck 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

claimcheck

What it is

Why one package

What it is NOT

Daemon mode (`Pipeline.from_daemon`)

How it compares to LLM-as-judge tools

Status

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

claimcheck 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

claimcheck

What it is

Why one package

What it is NOT

Daemon mode (Pipeline.from_daemon)

How it compares to LLM-as-judge tools

Status

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Daemon mode (`Pipeline.from_daemon`)