Open-world fact verification for AI claims (sibling to halluguard, which handles closed-world).
Project description
truthcheck
Open-world fact verification for AI claims, the web-search complement to halluguard.
Status: v0.1, working. Pipeline ships: Exa search backend, NLI verifier (lexical fallback when sentence-transformers not installed), SQLite cache, atomic claim splitter. Sibling to
adaptmem+halluguard+claimcheck.
The problem this solves
halluguard answers: "Is this claim supported by the documents I gave you?"
That's enough when you control the corpus (your shop's catalog, your company's internal docs, your codebase). It is not enough when:
- An LLM cites a figure ("Türkiye nüfusu 85 milyon").
- An LLM dates an event ("Bitcoin halving was in May 2024").
- An LLM names a person ("Alice Novak is the lead developer of Project X").
- An LLM repeats a recent news fact ("OpenAI released o4-mini in March 2026").
Halluguard can't answer because the ground truth lives on the open web,
not in the user's corpus. That's truthcheck's job.
Design constraints
- Stay composable. Truthcheck is a sibling, not a replacement.
halluguard.Guard.check(answer)→ corpus-grounded verdicttruthcheck.WebFactChecker.check(claim)→ open-web verdict- Caller decides which to invoke (or both, in series).
- Never silently dilute halluguard's positioning. Halluguard says "no LLM, no internet, deterministic." Truthcheck explicitly says "yes LLM (probably), yes internet, probabilistic." Honest naming.
- Backend-agnostic. Brave Search, Exa, Bing, DuckDuckGo, your internal corporate Confluence + Notion, anything that returns ranked snippets should plug in.
- Cost-aware. Web search APIs cost money. Truthcheck must
- tell the caller a USD estimate per claim before issuing requests
- cache aggressively (claim text → result, TTL configurable)
- support
dry_run=Trueto preview without API spend.
Sketch of the API
from truthcheck import WebFactChecker
checker = WebFactChecker(
backend="exa", # default; "brave" also supported
api_key=os.environ["EXA_API_KEY"],
trusted_domains=["wikipedia.org", "*.gov", "*.edu"],
cache_dir="~/.cache/truthcheck",
)
verdict = checker.check(
claim="Türkiye nüfusu 85 milyon",
n_sources=5,
)
# Verdict {
# status: SUPPORTED | UNSUPPORTED | CONTRADICTED | INCONCLUSIVE,
# confidence: 0.0, 1.0,
# sources: [
# Source(url="https://www.worldometers.info/...", snippet="...", score=0.91),
# Source(url="https://en.wikipedia.org/wiki/Demographics_of_Turkey", ...),
# ...
# ],
# atomic_claims: ["country: Türkiye", "metric: population", "value: 85 million"],
# cost_usd: 0.0007,
# cache_hit: False,
# }
v0.1 decisions (closed)
- Default backend: Exa (Brave's free tier was removed)
- Splitter: regex-based, deterministic, spacy/LLM in v0.2
- Verifier: NLI cross-encoder; lexical fallback when sentence-transformers absent
- Cache: SQLite under
~/.cache/truthcheck - Contradiction: INCONCLUSIVE + all sources surfaced
- Recency:
as_oftimestamp stamped on every verdict
Open for v0.2
- Turkish / multilingual NLI model
- spacy or small LLM for compound claim splitting
- DDG / SearXNG backend (no API key)
- Redis cache backend
Composition with the cluster
answer + corpus → halluguard.Guard.check()
│
▼
SUPPORTED? yes ─→ trust=high, done
│
no (claim isn't in corpus)
│
▼
answer claims → truthcheck.WebFactChecker.check()
│
▼
open-web verdict
Bigger picture: cluster gives the consumer a "belge → halluguard, dünya
→ truthcheck" pipeline so closed-world and open-world claims can both be
verified through one call site (a future helper in claimcheck).
What this repo is NOT
- Not a replacement for halluguard. Halluguard handles the case where you have a corpus. Don't use truthcheck where halluguard fits.
- Not a search engine. It's a verification layer that uses search engines as a substrate. Bring your own backend.
- Not a fact-database. It doesn't ship knowledge graphs. Every verdict is computed at request time against live sources.
- Not a guarantee. Open-world fact-checking is an active research area; FEVER state-of-the-art is around 75% F1. Truthcheck reports confidence, never asserts truth.
License
MIT
Install
pip install "truthcheck[brave]" # Brave backend
pip install "truthcheck[nli]" # NLI verifier (sentence-transformers)
Set EXA_API_KEY or BRAVE_API_KEY env var before use.
This is a draft. Atakan to review, sharpen the open questions, and decide whether to push public + commit to the v0.1 milestone.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nakata_truthcheck-0.1.0.tar.gz.
File metadata
- Download URL: nakata_truthcheck-0.1.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f1278db1bb4fd6ef62be26897a734ab4902f828fa1923036bb48d5bdc8273a2
|
|
| MD5 |
fc79fd79044ea8870bf90c58644128a6
|
|
| BLAKE2b-256 |
82870bd1bc94f5b898b9396007da39fccd97384b0cff80df36cafade9f3c44df
|
File details
Details for the file nakata_truthcheck-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nakata_truthcheck-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d9ddebac601dddea81438aefc3f759c608b70e2044d9c9a882aa614edecfbd6
|
|
| MD5 |
ab18413af59d29324c81293ab9549c7c
|
|
| BLAKE2b-256 |
c2b8a25f742da4a1d74732f54a2f79163dc298b1c40becc2c8ff41f39f4bbbf6
|