FastMCP server for academic paper search, metadata, PDFs, citations, and BibTeX
Project description
mcsci-hub
An MCP server that gives LLMs access to academic literature. Search papers, retrieve metadata, download PDFs, traverse citation graphs, and generate BibTeX -- all through a unified interface backed by four academic data sources.
Built with FastMCP.
search "CRISPR delivery" --> get_paper(doi) --> fetch_pdf(doi) --> [mcp-pdf] --> analyze
How It Works
mcsci-hub aggregates data from multiple academic APIs in parallel, merging results into a single coherent response:
| Source | What it provides | Access |
|---|---|---|
| CrossRef | Authoritative metadata -- title, authors, journal, volume, pages | Free (polite pool) |
| OpenAlex | Abstracts, open access URLs, semantic search | Free (polite pool) |
| Semantic Scholar | Citation counts, influential citations, fields of study, BibTeX | Free (optional API key) |
| Sci-Hub | Paywalled PDF access via mirror scraping | Fallback only |
The strategy: always try open access first. Sci-Hub is a last resort. When any source fails, the others still return what they can (graceful degradation).
Tools
| Tool | Description |
|---|---|
search_papers(query, ...) |
Keyword/topic search via OpenAlex with CrossRef fallback. Filters by year range and open access status. |
get_paper(doi) |
Full metadata from all sources, merged and cached. Returns title, authors, abstract, citation counts, fields of study, OA status. |
fetch_pdf(doi) |
Downloads the PDF: tries OA URL first, then Sci-Hub. Saves locally and hints to use mcp-pdf for parsing. |
get_citations(doi, max_results) |
Forward citations -- papers that cite this one. Includes Semantic Scholar's "influential" flag. |
get_references(doi, max_results) |
Backward references -- papers this one cites. Traces intellectual lineage. |
get_bibtex(doi) |
BibTeX entry from Semantic Scholar, or constructed from CrossRef metadata as fallback. |
Resources
| URI | Description |
|---|---|
scihub://mirrors |
Configured mirror domains and count |
scihub://config |
Full server configuration (cache, timeouts, PDF directory, API key status) |
doi://{prefix}/{suffix} |
Dynamic paper metadata lookup by DOI (e.g. doi://10.1038/nature12373) |
Prompts
| Prompt | Description |
|---|---|
literature_review(topic, depth) |
Guided workflow: search, detail top papers, traverse citations, synthesize. depth='deep' adds PDF analysis via mcp-pdf. |
paper_deep_dive(doi) |
Single paper analysis: metadata, PDF, full-text parsing, citation context. |
extract_findings(doi) |
Focused data extraction from a paper's PDF -- tables, results, conclusions. |
Composability with mcp-pdf
fetch_pdf saves PDFs locally and returns the file path. Tool responses and prompts guide the calling LLM to use mcp-pdf for full-text extraction, creating the pipeline:
search --> fetch --> parse --> analyze
Install mcp-pdf alongside mcsci-hub to unlock the full workflow.
Installation
Requires Python 3.11+.
# Clone and install
git clone git@git.supported.systems:MCP/mcsci-hub.git
cd mcsci-hub
uv sync
# Copy and edit environment config
cp .env.example .env
Add to Claude Code
# Local development
claude mcp add mcsci-hub -- uv run --directory /path/to/mcsci-hub mcsci-hub
# From PyPI (once published)
claude mcp add mcsci-hub -- uvx mcsci-hub
Run standalone (stdio transport)
uv run mcsci-hub
Configuration
All settings are managed via environment variables (loaded from .env by pydantic-settings).
| Variable | Default | Description |
|---|---|---|
SCIHUB_MIRRORS |
sci-hub.se,sci-hub.st,sci-hub.ru |
Comma-separated mirror domains, tried in order |
CROSSREF_MAILTO |
(empty) | Email for CrossRef polite pool (higher rate limits) |
OPENALEX_MAILTO |
(empty) | Email for OpenAlex polite pool |
S2_API_KEY |
(empty) | Optional Semantic Scholar API key for higher rate limits |
CACHE_MAX_SIZE |
1000 |
Maximum entries in the TTL cache |
CACHE_TTL |
3600 |
Cache entry lifetime in seconds |
HTTP_TIMEOUT |
30 |
Request timeout in seconds |
PDF_SAVE_DIR |
/tmp/mcsci-hub-pdfs |
Directory where downloaded PDFs are saved |
See .env.example for a ready-to-use template.
Data Source Strategy
Each field in the aggregated response has a primary and fallback source:
| Field | Primary | Fallback | Notes |
|---|---|---|---|
| Title, Authors | CrossRef | OpenAlex, S2 | CrossRef is the authoritative registry |
| Abstract | OpenAlex | S2 | Decoded from inverted index format |
| Citation count | S2 | OpenAlex | S2 distinguishes influential citations |
| Open Access URL | OpenAlex | S2 | OpenAlex integrates Unpaywall data |
| Paywalled PDF | Sci-Hub | -- | Mirror scraping, fallback only |
| BibTeX | S2 | Constructed from CrossRef | S2 has citationStyles.bibtex |
| Fields of study | S2 | -- | Better classification taxonomy |
All sources are queried in parallel via asyncio.gather with return_exceptions=True -- a slow or failing source never blocks the others.
Project Structure
src/mcsci_hub/
server.py # FastMCP app, lifespan, resources, prompts
config.py # pydantic-settings env config
cache.py # TTL-aware LRU cache
models.py # Pydantic response models
clients/
crossref.py # CrossRef API (polite pool, rate limited)
openalex.py # OpenAlex API (abstracts, OA URLs)
semantic_scholar.py # S2 API (citations, BibTeX)
scihub.py # Sci-Hub scraper (mirror rotation, retry)
tools/
search.py # search_papers
paper.py # get_paper + aggregate_paper()
pdf.py # fetch_pdf
citations.py # get_citations, get_references
bibtex.py # get_bibtex
Development
# Install with dev dependencies
uv sync
# Run tests (55 tests, all mocked with respx + FastMCP in-process transport)
uv run pytest -v
# Run tests with coverage
uv run pytest -v --cov=mcsci_hub
# Lint
uv run ruff check src/ tests/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcsci_hub-2026.2.16.1.tar.gz.
File metadata
- Download URL: mcsci_hub-2026.2.16.1.tar.gz
- Upload date:
- Size: 113.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"EndeavourOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7760d329d4826038ff75305405303ca6877706df28a8bab45fc534e7f0508b94
|
|
| MD5 |
7f6e996b5f9d421380c20c9fe13eb2ce
|
|
| BLAKE2b-256 |
b0603257dc55fd98da52b2cb8258b49a17ca02ef4a40c59e1d4476eb1f857c72
|
File details
Details for the file mcsci_hub-2026.2.16.1-py3-none-any.whl.
File metadata
- Download URL: mcsci_hub-2026.2.16.1-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"EndeavourOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f42f2d15e03cb04360e5915a85c9382fb5f77f7340fd27f631f2cef5590205db
|
|
| MD5 |
7b19b372e5134b51f9af4adf1ef1afd7
|
|
| BLAKE2b-256 |
cb25de3aad1f6f7985ce9ef62b5165225985a64251487b6887950d3b95db4044
|