A bibliography health checker for LaTeX projects.
Project description
VeraCite
A lightweight, auditable tool for checking the accuracy and conformity of BibTeX/biblatex bibliographies in scientific articles — a deterministic check against hallucinated and mangled citations.
VeraCite improves the veracity of the bibliographic record in scientific papers. Where BibTeX is notoriously tolerant of imperfect entries, VeraCite surfaces errors for fast human verification and AI-tool integration, helping bibliographic records better satisfy the FAIR principles (persistent identifiers, shared standards, accurate metadata). Because every check is a rule or a comparison against an authoritative record — never a language model guessing — it is exactly the kind of ground-truth gate an AI writing assistant needs: it confirms, against Crossref/arXiv and friends, that a reference is real, correctly identified, and accurately transcribed, catching the fabricated DOI, the invented paper, and the subtly wrong year or author that LLMs introduce.
VeraCite is for authors, publishers, and AI assistants who want to vet a
bibliography before publication. It checks a .bib file along three levels:
- Syntax — does it conform to the BibTeX/biblatex datamodel?
- Semantics — is each entry consistent with the authoritative online record (Crossref, arXiv, INSPIRE-HEP, OpenAlex, Open Library)?
- Context — (with
--tex) is each work genuinely cited, and cited appropriately, in the manuscript?
It produces both a human-readable report and a machine-readable JSON record, each with clear descriptions of every issue and an overall 0–100 integrity score.
VeraCite never modifies your bibliography or your LaTeX — it only flags
issues, with the offending line and (where possible) a suggested fix, for an author
to inspect and correct. Every finding carries a stable rule category
and, for online checks, a verify: link, so the report is auditable rather than
a black box.
Why VeraCite
A bibliography is easy to get wrong and tedious to check by hand: a wrong year, a mistyped DOI, a page number that doesn't match the published article, a preprint that has since appeared in a journal, or a misplaced citation that points to the wrong work. These slip through because BibTeX accepts them without complaint — and checking each entry against the real record is slow and error prone. The same errors, plus outright fabricated references, now arrive in bulk from LLM-assisted drafting, where a confident-looking citation may name a paper that does not exist or attach a real DOI to the wrong work. VeraCite does that checking for you — deterministically, against the source of record — and is built to be:
- Simple to run — one small Python program you run from the command line. No account, no website, no setup; it works out of the box and needs no extra software installed.
- Trustworthy — it doesn't guess. Every issue it reports comes from an explicit rule or a comparison against an authoritative record, so you can see exactly why each was flagged — which is also what makes it a sound check on an AI assistant's output rather than another source of guesses. The optional AI relevance check is off by default.
- Standards-based — it checks your entries against the official BibTeX/biblatex rules, standard journal-name abbreviations, and validated identifiers (DOI, arXiv, ISBN, ISSN, ORCID).
- Private by default — built to help you fix your own bibliography before submission. Unless you opt in, it never reads your manuscript and sends nothing to any AI service, so it is safe to run on confidential drafts.
Auditable by design
VeraCite's checks are not arbitrary or hidden in model weights. Every rule is a small, deterministic piece of Python or generated data that an author, publisher or developer can read, correct, and extend. The four places to look:
| What | Where | How to inspect / extend |
|---|---|---|
| Static checks (the rule registry) | veracite/rules.py |
Each check is a function decorated @rule (per entry) or @file_rule (whole file) and appended to a registry the engine iterates. Add a check by writing one function; the module docstring marks it "the part meant to be read and edited." |
| Structural validity (legal & mandatory fields) | veracite/data/biblatex_datamodel.json, loaded by veracite/datamodel.py |
Generated from biblatex's own blx-dm.def by tools/gen_datamodel.py — not a hand-kept blocklist. Regenerate when biblatex updates. |
| Severity, grouping & descriptions (what's an error vs. a note, the syntax/semantic/context bucket, and the catalog text) | resolve_severity(), CATEGORY_GROUP, and CATEGORY_DOC in veracite/report.py; defaults in DEFAULT_SETTINGS["severity"] (veracite/config.py) |
Every finding carries a stable string category. List the whole catalog with --list-rules; re-rank any category to error/warning/note via the severity block in a settings file (see Configuration) — no code change needed. |
| Integrity score (the 0–100 roll-up) | integrity() in veracite/verify.py |
A transparent weighted formula over explicit counts — 0.50·verification + 0.20·PID + 0.15·DOI + 0.15·(1 − defects) — not a model output. |
Start from the catalog — the complete list of every finding category VeraCite can emit, with its default severity, group, what supersedes it, and a one-line description:
python -m veracite --list-rules # human-readable table (the audit sheet)
python -m veracite --list-rules json # same, machine-readable
category severity group superseded by description
------------------------ -------- -------- ------------- ----------------------------------------
duplicate error syntax - duplicate citation key or DOI ...
metadata_mismatch warning semantic - author/title/year/vol/pages/journal differ
preprint_superseded warning context - a published version now exists
title_case note semantic record layer title looks miscased (mostly UPPERCASE)
...
Getting Started
Point it at a .bib file; it reports structural, stylistic, and record-level
problems for a human to read and a script to parse. Add --tex to also check how
the bibliography is cited.
python -m veracite --bib refs.bib # check every entry; reads no .tex
python -m veracite --bib refs.bib --tex paper/ # check only cited entries
python -m veracite --bib refs.bib --offline # static checks only (no network)
python -m veracite --bib refs.bib --tex paper/ --llm # + LLM relevance sweep
python -m veracite --bib refs.bib --skipnotes # warnings and errors only
python -m veracite --bib refs.bib --sort severity # global triage list, errors first
python -m veracite --bib refs.bib --json report.json
python -m veracite --list-rules # the rule catalog / audit sheet
Installed (pip install .) it also exposes a veracite command.
--bib FILE selects the bibliography; if omitted it is auto-discovered under the
cwd. VeraCite runs in one of two modes:
- bibliography-only (no
--tex): every entry is checked. No.texfile is ever read — the default run never touches your manuscript, so it is safe on confidential drafts. - citations (
--tex PATH, a file or directory, repeatable): only the entries cited by those sources are resolved online and (with--llm) rated; uncited entries are noted and skipped. A cited key with no.bibentry is an error.
.tex is read only when you ask for it with --tex; there is no silent
auto-discovery. Online checks are on by default; --offline makes the run fully
offline. The LLM relevance sweep is off unless --llm is given, and --llm
requires --tex (it needs the citation context). Every layer runs per entry
in bibliography order, so the report is a single list in .bib order — each
entry's findings printed once, followed by a file-level group and the summary.
Exit status is non-zero when any error is found, so it can gate CI.
Message types
The three levels mean different things and call for different action:
[ERROR]— must fix. A structural/syntax error that stops BibTeX from parsing (unbalanced braces, a missing=, an unknown entry type, a dropped reference); a duplicate; a retraction; a dead DOI; an id that resolves to a different paper (first author and title both differ); or an LLM-flagged clearly-wrong paper.[WARN]— investigate. A discrepancy between the record and the bib that may or may not be wrong: an author/title/given-name/year/volume/pages field differs from the id-resolved record, a non-standard journal abbreviation, a preprint with a published version, a linked erratum, or an LLM relevance ≤3. Open theverify:link and decide.[note]— stylistic, or filtered by biblatex anyway: casing, brace-protection, dashes, a name month, an invalid-for-biblatex field, an abbreviated given name, or a registry-parity suggestion. Hide with--skipnotes(still counted in the summary).
Example output
Findings are grouped into one block per bibliography entry, in .bib order. A
block opens with a header line that identifies the record and the verification status,
then lists each finding indented beneath it (in severity order):
[ 8/83] amo2009 @article line 96 VERIFIED (confidence 0.75); https://doi.org/10.1038/nature07640
[WARN] metadata_mismatch (line 98): [crossref] year differs: bib=2009, record=2010
[note] style (line 101): month 'may' is a name; biblatex will not sort it (suggested: 'may' -> '5')
The header carries everything that identifies and verifies the record (an optional
[i/N] progress counter, the key, @type, line, status, and confidence with a
verify: link); a clean VERIFIED entry with no other findings prints no block at
all. Each finding line follows one fixed shape:
[SEVERITY] category (line N): message (suggested: 'current' -> 'fixed')
[SEVERITY]is[ERROR]/[WARN]/[note].categoryis the stable rule code (see--list-rules); every finding has one — none falls back to a bare layer name.(line N)is the offending field's line in the.bib.- a fixable finding carries the advisory edit inline as
current -> suggested. - a message never wraps: any embedded newline is folded, so one finding = one line.
What it checks
Checks run in layers, syntax first.
-
Syntax — structural validity, so a file BibTeX cannot parse is never reported as healthy. Unbalanced braces, a stray extra
}, a field missing its=, an unknown entry type, a duplicate field, a file-level brace imbalance, and a cited key with no entry are each errors. The parser recovers at the next@entry{, so one broken entry does not hide the others.@stringabbreviations (both{…}and(…)delimited) and#concatenation are expanded, so ajournal = prbmacro is checked by its full value, not the bare macro name. -
Static (offline) — a rule registry (
rules.py); add a check by writing a function and decorating it@rule/@file_rule. Covers missing fields; biblatex field validity derived from the standard datamodel (see below); title casing/brace-protection; trailing periods;and others; arXiv-id consistency; page/dash/numpages sanity; encoding; DOI format; duplicate keys/DOIs; and file-wide consistency. Uncited entries are noted. -
Record (online) — resolve each entry by DOI (Crossref) or arXiv id and flag disagreement with the record. The authoritative record is the canonical reference: VeraCite never rewrites your
.bib, but each flagged field carries a suggested edit that conforms the bib to the record (e.g.year (suggested: '2009' -> '2010')), so the fix direction is always toward the registry — unless the record itself is clearly broken. Severity follows render-impact: a field that changes the rendered citation (title, author, year, journal, volume, issue, pages) is a warning; a purely stylistic difference (an abbreviated given name, casing) is a note. None is a wrong-paper claim — name folding handles suffixes (Jr/III), particles, collaborations, and abbreviated given names so these don't misfire, and findings show the original, readable names. A journal name matches the record when it is a known abbreviation (a small curated physics table inveracite/data/) or a valid ISO-4 abbreviation (period-insensitive, soPhys. Rev. BandPhys Rev Bboth matchPhysical Review B); only a genuinely non-standard journal string warns. The one identity error is when the first author and the title both differ strongly: the id likely resolves to a different paper (a copy-pasted DOI). Averify:link is printed for every entry with an online finding. -
Status (online) — retraction (via OpenAlex / Retraction Watch), linked errata/corrections/comments/replies, and preprints with a published version.
-
Cross-source (online) — when more than one authoritative source resolves an entry (Crossref, INSPIRE-HEP for physics, arXiv, Open Library for books), their records are compared against each other. A data difference (year, volume, issue, pages, or a genuinely different journal) is a warning (
source_conflict) naming both sources. Purely stylistic differences — title casing, or a full journal title vs its ISO-4 abbreviation — are not flagged, since both forms are valid. This surfaces stale or corrupted registry metadata the single-source comparison cannot see. -
Verification (online) — each entry gets one of three statuses with a confidence (0–1, a deterministic function of which sources agreed, not a model output). VERIFIED — the id resolved and the first author and title match; confidence reflects corroboration: 1.0 (clean match across ≥2 sources), ~0.95 (clean single source), 0.75 (a field disagrees), 0.70 (sources disagree, or only arXiv confirms). UNVERIFIED — could not confirm: no identifier, no record returned, or a DOI that did not resolve (also an error). MISMATCH — it resolved but the record's identity disagrees (the id may point at a different paper). If an entry carries no identifier at all (no DOI and no arXiv id), VeraCite searches for one — first Crossref (title + first author, corroborated by journal or ±1-year), then, failing that, arXiv by title (title + first-author surname; common for ML/physics works cited by venue only). On a strong match it verifies the entry and reports the identifier to add; when an arXiv hit links a published DOI (its
<arxiv:doi>), that DOI is preferred and suggested instead of the bare preprint id. This search is a last resort: an entry that already carries a DOI or arXiv id is resolved against that and the search never runs. A post-2005 article with no findable identifier is flagged; pre-2005 work is not penalized; arXiv ids and ISBNs count as PIDs. -
Integrity score (online) — a summary roll-up: counts of verified (and how many carry a caveat), unverified, mismatch, DOI coverage over eligible (post-2005) articles, PID coverage, and a 0–100 integrity score — a transparent weighted blend of verification rate (50%), PID coverage (20%), DOI coverage (15%), and freedom from integrity defects (15%). Printed beneath the verdict.
-
LLM (optional,
--llm, needs--tex) — for each cited entry, a language model rates relevance (1–5) from the abstract and the surrounding sentences, and flags a clear wrong paper. For a grouped citation (\cite{a,b,c}) it also sees the co-cited references and drops a low-relevance (≤3) odd-one-out a further point, surfacing an inappropriate citation hidden in a list of relevant ones. A wrong-paper flag is an error; relevance ≤3 a warning; 4–5 leaves a[llm] context OK N/5note. Because an LLM call costs tokens, every rated citation always shows exactly one line in the report (clean pass, weak, wrong paper, or rating-unavailable) rather than vanishing silently; the clean-pass note is hidden by--skipnoteslike any other note. Findings are worded as tentative, abstract-only opinions to verify, never authoritative judgements. The provider is pluggable (llm.py), but for now the only supported backend is Claude Code (theclaudeCLI, using your existing login), and it defaults to Claude Haiku for token efficiency — fast and inexpensive for a per-citation rating. Privacy:--llmsends those cited sentences to the provider, so it is off by default and prints a warning — do not use it on a confidential manuscript.
Identifier formats (DOI, arXiv, ISBN, ISSN, ORCID) are checked offline
by their check digits. An entry with a structural syntax error is reported, and
the rest of its checks (record, status, cross-source, LLM) are skipped until it
parses cleanly — comparing a garbled parse against a record only yields false
mismatches. When --tex is given, a multi-key \cite{} group that is not in
chronological order gets an advisory note (some bibliography styles cite the
earliest work first); it is never an error, since grouped-citation order is a
style choice, not a standard.
Machine-readable report (--json)
--json FILE writes the report as NDJSON (newline-delimited JSON): one
self-contained JSON record per line. Most lines are one bibliography entry,
keyed by its citation key and carrying everything about it — which phases have
been computed (see Checkpointing), its
verification status/confidence, the verify link, its identifiers, the
matched canonical_record, the sources that resolved it, and its issues (that
entry's findings). Two reserved records close the file: "<file>" (file-level
findings — duplicates, brace balance, dropped cited keys) and "<summary>" (the
integrity roll-up):
{"key": "amo2009", "phases": {"offline": true, "online": true, "llm": false},
"status": "VERIFIED", "confidence": 1.0, "verify": "https://doi.org/10.1038/nphys1364",
"identifiers": {"doi": "10.1038/nphys1364", "arxiv": null, "isbn": null},
"sources": ["crossref", "inspire"], "canonical_record": {"title": "...", "year": 2009},
"issues": []}
{"key": "<file>", "issues": []}
{"key": "<summary>", "veracite_version": "0.1.1",
"summary": {"checked": 152, "verified": 151, "verified_with_caveat": 8,
"unverified": 1, "mismatch": 0, "doi_coverage": 0.94, "pid_coverage": 0.97,
"integrity_score": 97}}
Read it line by line (for line in open(f): json.loads(line)); the "<summary>"
record holds the metrics and the veracite_version that produced the report (so a
saved or shared report is traceable to the exact tool revision — the version is also
printed on the terminal BIBLIOGRAPHY HEALTH line). Every other non-reserved record
is one reference. Under
--offline there is no online verification, so the "<summary>" record carries the
offline mode and finding counts with a null score ({"mode": "offline", "integrity_score": null, ...}) and each entry appears with phases.offline = true,
the rest false, and a null status/canonical_record — enough for a later online
run to resume it, never a fabricated score.
The NDJSON shape is what makes checkpointing cheap and crash-safe: a finished entry is one appended line, so an interrupted run leaves every prior line intact and loadable (see below).
Using VeraCite as a verification step for an AI assistant
VeraCite is deliberately read-only: it never edits your .bib or .tex. That
is what lets it serve as an independent verification gate in a
human-supervised AI editing loop — the checker has to be separate from whatever is
doing the writing, including an LLM. Applying the suggested edits is left to a
supervised tool (e.g. an AI assistant the author is driving), so the deterministic
checker and the judgement-applying editor stay cleanly separated. The NDJSON report
is the integration surface, designed to be consumed by a program, not just read:
- Every finding is grounded, not generated. A
metadata_mismatch,dead_doi,id_resolves_wrong_record, or anUNVERIFIEDstatus comes from a rule or a comparison against Crossref/arXiv/INSPIRE/OpenAlex, each with averify:link the agent can check independently. Theconfidenceis a deterministic function of which sources agreed — not a model output — so an agent can trust it to gate its own edits without compounding hallucination. - Findings route by
group, not by learning every category. Each issue carries agroupofsyntax/semantic/context:syntaxis the written form (safe, mechanical fixes);semanticis metadata that should be reconciled against the source of record before editing;contextneeds judgement. An agent can hold three policies instead of ~25 categories. - Fixable findings carry a structured
suggestedpatch —{"field": ..., "from": ..., "to": ...}, separated from the prosemessage— so a tool can apply an edit as data rather than parsing English. The record is the canonical reference, sotois the value that conforms the bib to it. - The catch is the point. A hallucinated reference surfaces as
UNVERIFIEDwith no findable identifier; a real DOI on the wrong paper asid_resolves_wrong_record(statusMISMATCH); a corrupted DOI/ISBN/arXiv id fails its offline check digit; a subtly-wrong year/venue/author as ametadata_mismatchwith the registry value to adopt. These are exactly the failure modes LLM-drafted bibliographies introduce.
Schema stability: the entry-record fields (status, confidence, phases,
identifiers, canonical_record, sources, issues) and each issue's
severity / group / category / suggested shape are the supported contract;
--list-rules json enumerates the full category vocabulary, and the
veracite_version on the "<summary>" record pins the producing revision so a
consumer can detect a contract change.
Checkpointing and phased resume
For a large bibliography an online run can take a long time (a few paced network
calls per entry), so a crash partway through should not throw the work away. When
you pass --json report.ndjson, VeraCite appends each entry's record as it
finishes — an O(1) write, so checkpointing after every entry stays cheap even at
10k references and a crash loses at most the entry in flight. It can then resume
from that file:
python -m veracite --bib refs.bib --offline --json report.ndjson # phase 1: fast, no network
python -m veracite --bib refs.bib --json report.ndjson # phase 2: resume, resolve online
python -m veracite --bib refs.bib --tex p/ --json report.ndjson --llm # phase 3: add LLM ratings
Point VeraCite at an existing report and it loads it, replays the work already
saved, and runs each entry only for the checks it does not yet have — so a job
can be built up in phases or simply restarted after an interruption. A re-run
appends a fresh record per entry (the last line for a key wins on load); at the
end of a clean run the file is compacted once — rewritten atomically with one
line per key in bibliography order. A partial line from a crash mid-write is simply
skipped on load. It prints a NOTE that it is resuming; choose a different --json
filename to run from scratch. The update rule per entry:
- offline (the static/syntax checks) always re-runs — it is cheap and needs no network.
- the online layer runs only for entries not already resolved online; an already-resolved entry is reused (its record, status and findings), no network.
--llmrates only entries not already rated. Because the rating needs the work's abstract — an LLM input that is not persisted — rating an entry also re-runs its online layer; an entry already rated is reused, spending no tokens.
VeraCite also warns up front when a run looks expensive: a bibliography of 200+
entries run online without --json prints a recommendation to add it (so the run
is saved and resumable), and --llm prints how many entries it will rate (it uses
LLM tokens). Both are warnings only — the run proceeds, so scripts and CI are
unaffected.
Configuration
VeraCite runs with no configuration. Optional settings are read from the first
of ./veracite.json, ~/.config/veracite/settings.json, ~/.veracite.json, or
a --settings FILE path. None is shipped, so the tool carries no personal data.
Recognized keys (all optional):
{
"contact_email": "you@example.org",
"llm_provider": "claude",
"llm_models": {"claude": "claude-haiku-4-5-20251001"},
"document_context": "a paper on <your topic>",
"protected_terms": ["Rydberg", "Yb", "Pulser"],
"severity": {"preprint_superseded": "error", "biblatex_validity": "note"},
"request_delay": 0.2,
"request_timeout": 20,
"endpoints": {"crossref_work": "https://api.crossref.org/works/{doi}"}
}
contact_emailis added to the User-Agent (Crossref/OpenAlex "polite pool"); may also be set withVERACITE_CONTACT_EMAIL.llm_providerselects the--llmbackend. For now the only supported provider isclaude(Claude Code, via theclaudeCLI and your existing login).llm_modelspins the model used per provider. The default is Claude Haiku (claude-haiku-4-5-20251001) — chosen for token efficiency, ample for a per-citation relevance rating. It is a specific, pinned id for reproducible ratings; if that model is ever retired,--llmwill reportrating unavailable: claude CLI failed (model '...')— setllm_modelsto a current id to fix it, no code change needed. Point it at a larger model (e.g. Sonnet) for tougher calls.severityre-ranks any finding category toerror/warning/note.protected_termsis the project's must-stay-capitalized title terms.request_delay/request_timeoutset API pacing;--delay/--timeoutoverride them. Pacing is per service and time-based: each external service has a minimum interval (request_delay, default 0.2 s; arXiv is paced at 3 s) and a request waits only the remainder of that interval — time already spent on other services or the rest of the pipeline counts, and a service whose interval has elapsed proceeds immediately. So an entry resolved by Crossref never pays an arXiv delay, and arXiv's slow limit spaces out across many entries rather than blocking each one. Only a real outbound request ever waits.endpointsrepoints the external API URLs if a service moves.
Layout
veracite/ package: config, parser, normalize, datamodel, report,
rules, record, llm, cli
tools/ gen_datamodel.py (regenerates the datamodel JSON)
tests/ pytest suite + .bib fixtures
Requirements
- Python 3.8+. Uses
requestsif present, else the stdliburllib. - Network (for the online layers):
api.crossref.org,export.arxiv.org,api.openalex.org,api.semanticscholar.org,inspirehep.net(physics),openlibrary.org/googleapis.com(ISBN). All optional and degrade gracefully — a source that fails to respond is reported as "could not retrieve", never a crash, and--offlineskips them all. - For
--llmwith the default provider: theclaudeCLI onPATH, logged in (runclaudeonce and sign in; it needs a Claude account).--llmprobes the provider before the run and, if it is missing or not logged in, stops up front with how to fix it rather than failing per entry. Everything except--llmworks with no account.
Known limitations
VeraCite compares against registry metadata; errors in free text or in fields no registry encodes are out of reach. Correction/erratum and published-version coverage is best-effort. "No problem found" means no problem in the checkable fields, not that every field was verified.
Tests
pip install pytest
python -m pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file veracite-0.1.1.tar.gz.
File metadata
- Download URL: veracite-0.1.1.tar.gz
- Upload date:
- Size: 171.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5c8756d3c71e7ccdf65d0b41650b2450afb8f1f4d50ab4ef91c3601e4553d36
|
|
| MD5 |
304d9cfa420bed99ae9c66d6062df574
|
|
| BLAKE2b-256 |
7d66dd147dbc9d8adcc301783ea8a9149123e2b2fa159b86ce4ce00fb409cb1a
|
File details
Details for the file veracite-0.1.1-py3-none-any.whl.
File metadata
- Download URL: veracite-0.1.1-py3-none-any.whl
- Upload date:
- Size: 132.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ea762435536e3326ec6ce60e6f97d1b5281aa0fcda6800069d1b85dd449ad1d
|
|
| MD5 |
cb14cc1b22a9c6b53f6afa1b38e81f92
|
|
| BLAKE2b-256 |
73265d2f2739701f637fd86a74759afaa6c9928a896e2c561a8e91d01f64771c
|