Skip to main content

MCP server for verifying academic citations via Semantic Scholar, OpenAlex, and CrossRef

Project description

mcp-refchecker

An MCP server that lets Claude verify academic citations in real time against Semantic Scholar, OpenAlex, and Crossref — catching hallucinated or incorrect references before they end up in your work.

Built on top of academic-refchecker (MIT).

Tool

verify_citation — verifies that a cited paper exists and that its metadata (title, authors, year, venue) matches what was cited.

Parameter Type Required Description
title string yes Title of the cited paper
authors string[] no List of author names
year integer no Publication year
doi string no DOI (e.g. 10.1145/12345)
arxiv_id string no arXiv ID (e.g. 2301.00001)
url string no Direct URL to the paper

Returns JSON:

{
  "verified": true,
  "url": "https://...",
  "matched_paper": {
    "title": "...",
    "authors": [...],
    "year": 2023,
    "venue": "..."
  },
  "possible_match": null,
  "errors": null,
  "warnings": null,
  "info": null
}

Result fields

  • verifiedtrue if the paper was found and all provided metadata (year, authors, venue) matches. false if there is a real metadata conflict or the paper could not be found.
  • matched_paper — the authoritative metadata from the verification source.
  • possible_match — a Crossref fallback match when the exact title was not found but a close variant was (see "Fuzzy fallback" below).
  • errors — hard errors that block verification (wrong year, wrong authors, paper not found).
  • warnings — soft warnings that don't block verification (arXiv v1 vs v2 differences, arXiv preprint vs published venue, incomplete input metadata).
  • info — informational suggestions (e.g., "reference could include arXiv URL").

What counts as an error vs a warning

academic-refchecker returns a flat list of issues with some inconsistency (year mismatches get marked as warnings while author mismatches get marked as errors). This wrapper normalises the output:

  • Promoted to hard errors: plain year/author/venue mismatches where the cited metadata actually differs from reality. These block verified.
  • Demoted to warnings: "missing field" errors when the paper was found but the user didn't provide that field in the first place. Missing input metadata is not evidence of a hallucinated citation.
  • Kept as warnings: arXiv version differences (v1 vs v2), preprint-vs-published venue notes.

Fuzzy fallback and its limitations

When academic-refchecker reports that a paper could not be verified, this wrapper makes a secondary query to Crossref using fuzzy title matching and fuzzywuzzy.ratio. If a candidate with ≥ 85% similarity is found, it's returned as possible_match with a warning.

What the fuzzy fallback catches:

  • Stylistic title variations (case differences, punctuation, word order)
  • Minor rewording
  • Titles where refchecker's strict comparison rejected an otherwise valid match

What the fuzzy fallback does NOT catch:

  • Real typos in distinctive title words (e.g., "Atention Is All You Need")
  • Heavily mangled titles

This is a fundamental limitation of free academic search APIs. Crossref, OpenAlex, and Semantic Scholar all do keyword/token-based search — as soon as a distinctive word is misspelled, it simply isn't in the search index, and the real paper won't appear in results regardless of how you post-process them. Catching real typos would require semantic embeddings from a paid API (OpenAI, Voyage, etc.) or a full-text fuzzy search engine, neither of which is exposed by free scholarly data sources.

If you suspect a typo but verify_citation returns unverified, the best workaround is to rewrite the title in the most canonical form you can and try again.

Installation

pip install mcp-refchecker

Or from source:

git clone https://github.com/JonasBaath/mcp-refchecker
cd mcp-refchecker
pip install .

Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "refchecker": {
      "command": "mcp-refchecker"
    }
  }
}

Optional environment variables

  • SEMANTIC_SCHOLAR_API_KEYapply for one here for higher rate limits on refchecker's primary verification path.
  • CROSSREF_MAILTO — your contact email, used to opt into Crossref's polite pool for more reliable fuzzy fallback access.
  • MCP_REFCHECKER_DEBUG — set to any non-empty value to print debug logging from the fuzzy fallback path to stderr.

Example with all optional settings:

{
  "mcpServers": {
    "refchecker": {
      "command": "mcp-refchecker",
      "env": {
        "SEMANTIC_SCHOLAR_API_KEY": "your-key-here",
        "CROSSREF_MAILTO": "you@example.com"
      }
    }
  }
}

License

MIT — © Jonas Bååth. Built on academic-refchecker (MIT).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_refchecker-0.1.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_refchecker-0.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file mcp_refchecker-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_refchecker-0.1.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mcp_refchecker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 94c5bd08518d41c665ffe0774ef03b0b32f4a67a7d2037bc969f290512cf4029
MD5 593bffcb178a65ba66fd85f63e406ef3
BLAKE2b-256 23b418983f3489e1b6e9a076def080c7286549f2a91237c4568785341e5d9e89

See more details on using hashes here.

File details

Details for the file mcp_refchecker-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_refchecker-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mcp_refchecker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1708ca739a7247d890f967b866163928724e927517d5e4c95f0493f49b46cff
MD5 8734deb3be7be9b1287a30323130d433
BLAKE2b-256 d366dd3750ea59850bd6b5d03b7173f08b1dddfb31bbd47c0c2e155403b57526

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page