Citation verification tool: existence, URL liveness, and content relevance checks
Project description
CiteSentry
Citation verification tool: check whether references actually exist, whether their URLs are live, and whether the content is relevant to the citation.
What it does
Three checks per reference:
- Existence — resolves against OpenAlex, Crossref, Semantic Scholar, arXiv, and domain-specific databases (PubMed for biomedical, DBLP for CS)
- URL liveness — HTTP HEAD/GET check; classifies 2xx/4xx/timeout/bot-protection
- Content relevance — LLM-backed check comparing fetched content to the cited title/topic (requires
DEEPSEEK_API_KEYfor CLI use)
Verdicts: VERIFIED, METADATA_MISMATCH, DEAD_URL, CONTENT_DRIFT, NOT_FOUND, UNRESOLVABLE.
NOT_FOUND means "could not verify — likely fabricated, needs manual review." Never "fake."
Install
pip install citesentry # basic install
pip install "citesentry[cli-llm]" # + DeepSeek for relevance checks
For development:
git clone https://github.com/mkassaf/CiteSentry
cd CiteSentry
pip install -e ".[dev]"
CLI usage
# Check a BibTeX file
citesentry check refs.bib
# Check a RIS/CSL-JSON/NBIB/plaintext file
citesentry check refs.ris
citesentry check refs.json
# Read from stdin
cat refs.txt | citesentry check -
# Single ad-hoc reference
citesentry check-one "Vaswani et al. (2017). Attention is all you need. NeurIPS."
# Output formats: table (default), json, md
citesentry check refs.bib --format json
citesentry check refs.bib --format md > report.md
# Skip checks
citesentry check refs.bib --no-llm # skip relevance (no API key needed)
citesentry check refs.bib --no-url # skip URL liveness
# Domain adapters (auto by default)
citesentry check refs.bib --domain pubmed # force PubMed only
citesentry check refs.bib --domain none # disable domain adapters
# Override plaintext style detection
citesentry check refs.txt --style ieee
Exit code is non-zero if any reference is NOT_FOUND or DEAD_URL (useful in CI).
MCP server (Claude Desktop / Claude Code)
Add to your claude_desktop_config.json:
{
"mcpServers": {
"citesentry": {
"command": "citesentry-mcp",
"env": {
"CITESENTRY_MAILTO": "you@example.com",
"DEEPSEEK_API_KEY": "sk-..."
}
}
}
}
Or with uvx (no prior install needed):
{
"mcpServers": {
"citesentry": {
"command": "uvx",
"args": ["--from", "citesentry", "citesentry-mcp"],
"env": { "CITESENTRY_MAILTO": "you@example.com" }
}
}
}
MCP tools exposed:
verify_reference(reference, check_url, check_relevance)— single referenceverify_reference_list(references, format, check_url, check_relevance)— batchcheck_url_alive(url)— standalone URL check
Claude Code (CLI)
Register the server once:
claude mcp add citesentry \
-e CITESENTRY_MAILTO=you@example.com \
-- uvx --from citesentry citesentry-mcp
Then in any Claude Code session, ask naturally:
"Use citesentry to verify this reference: Vaswani et al. (2017). Attention is all you need. NeurIPS."
"Check whether all the references in refs.bib are real."
"Is https://arxiv.org/abs/1706.03762 still live?"
Any MCP-compatible agent (Python example)
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
server = StdioServerParameters(
command="uvx",
args=["--from", "citesentry", "citesentry-mcp"],
env={"CITESENTRY_MAILTO": "you@example.com"},
)
async def main():
async with stdio_client(server) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
result = await session.call_tool(
"verify_reference",
{"reference": "Vaswani et al. (2017). Attention is all you need. NeurIPS."},
)
print(result.content[0].text)
asyncio.run(main())
Environment variables
| Variable | Default | Description |
|---|---|---|
CITESENTRY_MAILTO |
citesentry@example.com |
Polite email for OpenAlex/Crossref API |
DEEPSEEK_API_KEY |
— | Required for relevance checks in CLI |
DEEPSEEK_BASE_URL |
https://api.deepseek.com/v1 |
OpenAI-compatible endpoint |
DEEPSEEK_MODEL |
deepseek-chat |
Model for relevance judgments |
Supported input formats
- BibTeX (
.bib) — via bibtexparser - RIS (
.ris) — via rispy; covers Zotero, Mendeley, EndNote, Web of Science - CSL JSON (
.json) — Zotero exports - PubMed NBIB (
.nbib) - DOI list (
.txtwith one DOI per line) - Plaintext reference sections — IEEE, APA, Vancouver, MLA, Chicago; auto-detected
- PDF (
.pdf) — extracts reference section text via pdfminer.six
Caching
Results are cached in a SQLite database (~/.cache/citesentry/cache.db). Pass --no-cache to bypass.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file citesentry-0.1.1.tar.gz.
File metadata
- Download URL: citesentry-0.1.1.tar.gz
- Upload date:
- Size: 154.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2e9095d1910738bec911b04e5b863ad1a9e5491fec08af7d0459d18e2ee9a5a
|
|
| MD5 |
c14d016464fd4207c30ed4eea9a66432
|
|
| BLAKE2b-256 |
ee0af964ae9342301cff71891407bd1fac97816a7531c76d9b6e9db27b76cf70
|
Provenance
The following attestation bundles were made for citesentry-0.1.1.tar.gz:
Publisher:
publish.yml on mkassaf/CiteSentry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
citesentry-0.1.1.tar.gz -
Subject digest:
d2e9095d1910738bec911b04e5b863ad1a9e5491fec08af7d0459d18e2ee9a5a - Sigstore transparency entry: 1684758249
- Sigstore integration time:
-
Permalink:
mkassaf/CiteSentry@b059d481e41fa325e5d6a23a7394ee70f0177f64 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/mkassaf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b059d481e41fa325e5d6a23a7394ee70f0177f64 -
Trigger Event:
push
-
Statement type:
File details
Details for the file citesentry-0.1.1-py3-none-any.whl.
File metadata
- Download URL: citesentry-0.1.1-py3-none-any.whl
- Upload date:
- Size: 40.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7692b42c1b2981adbcda736be70e59dc82fea76e7aa6f35274c8ab095c4decdb
|
|
| MD5 |
87e9a59a8e7e0885c92f8644a29eafe0
|
|
| BLAKE2b-256 |
796baefd745ee79f841da27b3ada6ee21e47daee5f91d5b48103aef8c6da8651
|
Provenance
The following attestation bundles were made for citesentry-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on mkassaf/CiteSentry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
citesentry-0.1.1-py3-none-any.whl -
Subject digest:
7692b42c1b2981adbcda736be70e59dc82fea76e7aa6f35274c8ab095c4decdb - Sigstore transparency entry: 1684758287
- Sigstore integration time:
-
Permalink:
mkassaf/CiteSentry@b059d481e41fa325e5d6a23a7394ee70f0177f64 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/mkassaf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b059d481e41fa325e5d6a23a7394ee70f0177f64 -
Trigger Event:
push
-
Statement type: