Citation verification tool: existence, URL liveness, and content relevance checks

Project description

CiteSentry

Citation verification tool: check whether references actually exist, whether their URLs are live, and whether the content is relevant to the citation.

What it does

Three checks per reference:

Existence — resolves against OpenAlex, Crossref, Semantic Scholar, arXiv, and domain-specific databases (PubMed for biomedical, DBLP for CS)
URL liveness — HTTP HEAD/GET check; classifies 2xx/4xx/timeout/bot-protection
Content relevance — LLM-backed check comparing fetched content to the cited title/topic (requires DEEPSEEK_API_KEY for CLI use)

Verdicts

Verdict	Meaning	Action
`VERIFIED`	Paper found in a scholarly database with matching title, authors, year, and DOI	None — citation is good
`METADATA_MISMATCH`	Paper found, but a field in your citation differs from the database record (commonly a truncated or wrong DOI)	Correct the mismatched field; the paper itself is real
`DEAD_URL`	Paper exists but one or more cited URLs return 4xx/5xx or time out	Update or remove the URL
`CONTENT_DRIFT`	Paper exists and URL is live, but fetched content doesn't match what the citation claims	Review whether you are citing the right paper
`NOT_FOUND`	Could not verify in any database — may be fabricated, obscure, or not yet indexed	Manual verification recommended; see note below
`UNRESOLVABLE`	Could not attempt verification — citation is missing enough fields (no title, no DOI, no authors) or the existence check errored	Add missing fields (year, DOI, venue) and re-run

NOT_FOUND is not "fake"

NOT_FOUND means the tool could not confirm the paper in the databases it queries (OpenAlex, Crossref, Semantic Scholar, arXiv, PubMed, DBLP). Common legitimate reasons:

Recent publications — papers from the past 6–12 months are often not yet indexed, especially conference proceedings
Preprints — papers only on institutional repositories or not yet on arXiv
Truncated or missing DOI — without a DOI, title search may not find the paper
Obscure venues — proceedings from smaller conferences may not be in major databases

A high NOT_FOUND rate in a survey of 2025–2026 literature (30–40%) is normal and expected.

Expected verification rates by publication year

Publication year	Typical verification rate
≤ 2023	85–100%
2024	60–85%
2025	30–60%
2026	10–30%

Rates are lower for recent years due to database indexing lag, not citation quality.

Install

pip install citesentry                 # basic install
pip install "citesentry[cli-llm]"      # + DeepSeek for relevance checks

For development:

git clone https://github.com/mkassaf/CiteSentry
cd CiteSentry
pip install -e ".[dev]"

CLI usage

# Check a PDF paper — extracts and verifies the references section automatically
citesentry check paper.pdf --no-llm
citesentry check paper.pdf --no-llm --format md > report.md

# Check a BibTeX file
citesentry check refs.bib

# Check a RIS/CSL-JSON/NBIB/plaintext file
citesentry check refs.ris
citesentry check refs.json

# Read from stdin
cat refs.txt | citesentry check -

# Single ad-hoc reference
citesentry check-one "Vaswani et al. (2017). Attention is all you need. NeurIPS."

# Output formats: table (default), json, md
citesentry check refs.bib --format json
citesentry check refs.bib --format md > report.md

# Skip checks
citesentry check refs.bib --no-llm       # skip relevance (no API key needed)
citesentry check refs.bib --no-url       # skip URL liveness

# Domain adapters (auto by default)
citesentry check refs.bib --domain pubmed   # force PubMed only
citesentry check refs.bib --domain none     # disable domain adapters

# Override plaintext style detection
citesentry check refs.txt --style ieee

Exit code is non-zero if any reference is NOT_FOUND or DEAD_URL (useful in CI).

PDF input — known limitation

pdfminer.six works well for single-column PDFs. Two-column papers (most IEEE/ACM conference papers) often produce jumbled text when columns are mixed, which breaks reference parsing. If you see very few references parsed or garbled titles, extract the references section manually first:

# Copy-paste the references section into a text file, then:
citesentry check refs.txt --no-llm

MCP server (Claude Desktop / Claude Code)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "citesentry": {
      "command": "citesentry-mcp",
      "env": {
        "CITESENTRY_MAILTO": "you@example.com",
        "DEEPSEEK_API_KEY": "sk-..."
      }
    }
  }
}

Or with uvx (no prior install needed):

{
  "mcpServers": {
    "citesentry": {
      "command": "uvx",
      "args": ["--from", "citesentry", "citesentry-mcp"],
      "env": { "CITESENTRY_MAILTO": "you@example.com" }
    }
  }
}

MCP tools exposed:

verify_reference(reference, check_url, check_relevance) — single reference
verify_reference_list(references, format, check_url, check_relevance) — batch
check_url_alive(url) — standalone URL check

Claude Code (CLI)

claude mcp add citesentry \
  -e CITESENTRY_MAILTO=you@example.com \
  -- uvx --from citesentry citesentry-mcp

Then in any Claude Code session, ask naturally:

"Use citesentry to verify this reference: Vaswani et al. (2017). Attention is all you need. NeurIPS."

"Check whether all the references in refs.bib are real."

"Is https://arxiv.org/abs/1706.03762 still live?"

Any MCP-compatible agent (Python example)

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

server = StdioServerParameters(
    command="uvx",
    args=["--from", "citesentry", "citesentry-mcp"],
    env={"CITESENTRY_MAILTO": "you@example.com"},
)

async def main():
    async with stdio_client(server) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            result = await session.call_tool(
                "verify_reference",
                {"reference": "Vaswani et al. (2017). Attention is all you need. NeurIPS."},
            )
            print(result.content[0].text)

asyncio.run(main())

Environment variables

Variable	Default	Description
`CITESENTRY_MAILTO`	`citesentry@example.com`	Polite email for OpenAlex/Crossref API
`DEEPSEEK_API_KEY`	—	Required for relevance checks in CLI
`DEEPSEEK_BASE_URL`	`https://api.deepseek.com/v1`	OpenAI-compatible endpoint
`DEEPSEEK_MODEL`	`deepseek-chat`	Model for relevance judgments

Supported input formats

BibTeX (.bib) — via bibtexparser
RIS (.ris) — via rispy; covers Zotero, Mendeley, EndNote, Web of Science
CSL JSON (.json) — Zotero exports
PubMed NBIB (.nbib)
DOI list (.txt with one DOI per line)
Plaintext reference sections — IEEE, APA, Vancouver, MLA, Chicago; auto-detected
PDF (.pdf) — extracts reference section text via pdfminer.six

Reference enrichment

When a citation is incomplete (missing year, DOI, or venue) but the tool finds a matching paper in a database, the result includes an enriched field with the complete metadata sourced from the database. This is visible in JSON output:

{
  "overall_verdict": "VERIFIED",
  "reference": { "title": "SOEN-101: ...", "year": null, "doi": null },
  "enriched":  { "title": "SOEN-101: ...", "year": 2025, "doi": "10.1109/ICSE55347.2025.00638", "venue": "ICSE" }
}

Use this to correct incomplete citations in your bibliography without manual searching.

Caching

Results are cached in a SQLite database (~/.cache/citesentry/cache.db). Pass --no-cache to bypass.

Project details

Release history Release notifications | RSS feed

0.3.8

Jun 1, 2026

0.3.7

Jun 1, 2026

0.3.6

Jun 1, 2026

0.3.5

Jun 1, 2026

0.3.4

Jun 1, 2026

This version

0.3.3

Jun 1, 2026

0.3.2

Jun 1, 2026

0.3.1

Jun 1, 2026

0.3.0

Jun 1, 2026

0.2.7

Jun 1, 2026

0.2.5

May 31, 2026

0.2.0

May 31, 2026

0.1.1

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citesentry-0.3.3.tar.gz (164.6 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

citesentry-0.3.3-py3-none-any.whl (50.6 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file citesentry-0.3.3.tar.gz.

File metadata

Download URL: citesentry-0.3.3.tar.gz
Upload date: Jun 1, 2026
Size: 164.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citesentry-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`cf4dd592099a81774a7fe3e2b418489e8cad787877afda5abe49e48514879d28`
MD5	`cdba453a7a5cfcfc5a66a16a94959da4`
BLAKE2b-256	`34f0c91c243ebff33537efa185096a299d8541b0aa21f2dc785f333336f67ab4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for citesentry-0.3.3.tar.gz:

Publisher: publish.yml on mkassaf/CiteSentry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: citesentry-0.3.3.tar.gz
- Subject digest: cf4dd592099a81774a7fe3e2b418489e8cad787877afda5abe49e48514879d28
- Sigstore transparency entry: 1690880584
- Sigstore integration time: Jun 1, 2026
Source repository:
- Permalink: mkassaf/CiteSentry@2172c7584d239f868f7d4422d51da89f6def4c04
- Branch / Tag: refs/tags/v0.3.3
- Owner: https://github.com/mkassaf
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2172c7584d239f868f7d4422d51da89f6def4c04
- Trigger Event: push

File details

Details for the file citesentry-0.3.3-py3-none-any.whl.

File metadata

Download URL: citesentry-0.3.3-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 50.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citesentry-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2849430224724ec2a61d46a8d8eae28871b583930d23c5ad094036aeeb42c7fe`
MD5	`830fcdb37cc321b1601865cd653b39df`
BLAKE2b-256	`09998cee511ed937ee1d6a785f3e044091e073f369c81c5f0736b14294127995`

See more details on using hashes here.

Provenance

The following attestation bundles were made for citesentry-0.3.3-py3-none-any.whl:

Publisher: publish.yml on mkassaf/CiteSentry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: citesentry-0.3.3-py3-none-any.whl
- Subject digest: 2849430224724ec2a61d46a8d8eae28871b583930d23c5ad094036aeeb42c7fe
- Sigstore transparency entry: 1690880596
- Sigstore integration time: Jun 1, 2026
Source repository:
- Permalink: mkassaf/CiteSentry@2172c7584d239f868f7d4422d51da89f6def4c04
- Branch / Tag: refs/tags/v0.3.3
- Owner: https://github.com/mkassaf
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2172c7584d239f868f7d4422d51da89f6def4c04
- Trigger Event: push

citesentry 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CiteSentry

What it does

Verdicts

NOT_FOUND is not "fake"

Expected verification rates by publication year

Install

CLI usage

PDF input — known limitation

MCP server (Claude Desktop / Claude Code)

Claude Code (CLI)

Any MCP-compatible agent (Python example)

Environment variables

Supported input formats

Reference enrichment

Caching

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance