FastMCP server for Semantic Scholar with OpenAlex enrichment and docling PDF conversion
Project description
scholar-mcp
A FastMCP server providing structured academic literature access via Semantic Scholar, with OpenAlex enrichment and optional docling-serve PDF conversion.
Features
- Search & retrieval -- full-text paper search with year, venue, field-of-study, and citation-count filters; single-paper lookup by DOI, S2 ID, arXiv ID, and more; author profile and name search
- Citation graph -- forward citations, backward references, BFS graph traversal up to configurable depth, and shortest-path bridge paper discovery
- Recommendations -- paper recommendations from positive (and optional negative) examples via the S2 recommendation API
- OpenAlex enrichment -- augment paper metadata with open-access URLs, affiliations, funders, concepts, and OA status
- PDF conversion -- download open-access PDFs and convert to Markdown via docling-serve, with optional VLM enrichment for formulas and figures
- Intelligent caching -- SQLite-backed cache with per-table TTLs (30 days for papers/authors, 7 days for citations/references) and identifier aliasing
- Authentication -- bearer token, OIDC (OAuth 2.1), or both simultaneously (multi-auth)
- Multi-transport -- stdio (Claude Desktop), HTTP (streamable-http), and SSE transports
- Linux packages --
.deband.rpmpackages with systemd service and security hardening
Installation
With uvx (recommended)
uvx --from pvliesdonk-scholar-mcp scholar-mcp serve
With pip
pip install 'pvliesdonk-scholar-mcp[mcp]'
scholar-mcp serve
With Docker
docker run -v scholar-mcp-data:/data/scholar-mcp \
ghcr.io/pvliesdonk/scholar-mcp:latest
Linux packages
Download .deb or .rpm from the latest release:
# Debian/Ubuntu
sudo dpkg -i scholar-mcp_*.deb
# RHEL/Fedora
sudo rpm -i scholar-mcp-*.rpm
Note: The PyPI package is
pvliesdonk-scholar-mcp. The CLI command installed isscholar-mcp.
Quick Start
stdio transport (Claude Desktop / MCP clients)
uvx --from pvliesdonk-scholar-mcp scholar-mcp serve
API key optional but recommended: The server works without a Semantic Scholar API key, but unauthenticated requests are limited to ~1 req/s and will hit 429 throttles quickly during multi-step operations like citation graph traversal. Request a free key to get ~10 req/s.
Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"scholar": {
"command": "uvx",
"args": ["--from", "pvliesdonk-scholar-mcp", "scholar-mcp", "serve"],
"env": {
"SCHOLAR_MCP_S2_API_KEY": "your-key"
}
}
}
}
HTTP transport
uvx --from pvliesdonk-scholar-mcp scholar-mcp serve --transport http --port 8000
Configuration
All settings are controlled via environment variables with the SCHOLAR_MCP_ prefix.
Core
| Variable | Default | Description |
|---|---|---|
SCHOLAR_MCP_S2_API_KEY |
-- | Semantic Scholar API key (request one); optional but recommended for higher rate limits |
SCHOLAR_MCP_READ_ONLY |
true |
If true, write-tagged tools (fetch_paper_pdf, convert_pdf_to_markdown, fetch_and_convert) are hidden |
SCHOLAR_MCP_CACHE_DIR |
/data/scholar-mcp |
Directory for the SQLite cache database and downloaded PDFs |
SCHOLAR_MCP_CONTACT_EMAIL |
-- | Included in the OpenAlex User-Agent for polite pool access (faster rate limits) |
SCHOLAR_MCP_LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
PDF Conversion (optional)
| Variable | Default | Description |
|---|---|---|
SCHOLAR_MCP_DOCLING_URL |
-- | Base URL of a running docling-serve instance (e.g. http://localhost:5001) |
SCHOLAR_MCP_VLM_API_URL |
-- | OpenAI-compatible VLM endpoint for formula/figure-enriched PDF conversion |
SCHOLAR_MCP_VLM_API_KEY |
-- | API key for the VLM endpoint |
SCHOLAR_MCP_VLM_MODEL |
gpt-4o |
Model name for VLM-enriched conversion |
Authentication (optional)
| Variable | Default | Description |
|---|---|---|
SCHOLAR_MCP_BEARER_TOKEN |
-- | Static bearer token for HTTP transport authentication |
SCHOLAR_MCP_BASE_URL |
-- | Public base URL, required for OIDC (e.g. https://mcp.example.com) |
SCHOLAR_MCP_OIDC_CONFIG_URL |
-- | OIDC discovery endpoint URL |
SCHOLAR_MCP_OIDC_CLIENT_ID |
-- | OIDC client ID |
SCHOLAR_MCP_OIDC_CLIENT_SECRET |
-- | OIDC client secret |
SCHOLAR_MCP_OIDC_JWT_SIGNING_KEY |
-- | JWT signing key; required on Linux/Docker to survive restarts (openssl rand -hex 32) |
MCP Tools
Search & Retrieval
| Tool | Description |
|---|---|
search_papers |
Full-text search with year, venue, field-of-study, and citation-count filters. Returns up to 100 results with pagination. |
get_paper |
Fetch full metadata for a single paper by DOI, S2 ID, arXiv ID, ACM ID, or PubMed ID. |
get_author |
Fetch author profile with publications, or search by name. |
Citation Graph
| Tool | Description |
|---|---|
get_citations |
Forward citations (papers that cite a given paper) with optional filters. |
get_references |
Backward references (papers cited by a given paper). |
get_citation_graph |
BFS traversal from seed papers, returning nodes + edges up to configurable depth. |
find_bridge_papers |
Shortest citation path between two papers. |
Recommendations
| Tool | Description |
|---|---|
recommend_papers |
Paper recommendations from 1--5 positive examples and optional negative examples. |
Utility
| Tool | Description |
|---|---|
batch_resolve |
Resolve up to 100 identifiers to full metadata in one call, with OpenAlex fallback. |
enrich_paper |
Augment S2 metadata with OpenAlex fields (affiliations, funders, OA status, concepts). |
PDF Conversion (requires docling-serve)
| Tool | Description |
|---|---|
fetch_paper_pdf |
Download open-access PDF for a paper. |
convert_pdf_to_markdown |
Convert a local PDF to Markdown via docling-serve. |
fetch_and_convert |
Full pipeline: fetch OA PDF, convert to Markdown, return both. |
PDF tools are write-tagged and hidden when
SCHOLAR_MCP_READ_ONLY=true(the default).
Task Polling
| Tool | Description |
|---|---|
get_task_result |
Poll for the result of a background task by ID. |
list_tasks |
List all active background tasks. |
Long-running operations (PDF download/conversion) and rate-limited S2 requests return
{"queued": true, "task_id": "..."}immediately. Useget_task_resultto poll for the result.
Docker Compose
services:
scholar-mcp:
image: ghcr.io/pvliesdonk/scholar-mcp:latest
restart: unless-stopped
environment:
SCHOLAR_MCP_S2_API_KEY: "${SCHOLAR_MCP_S2_API_KEY}"
SCHOLAR_MCP_DOCLING_URL: "http://docling-serve:5001"
SCHOLAR_MCP_VLM_API_URL: "${VLM_API_URL:-}"
SCHOLAR_MCP_VLM_API_KEY: "${VLM_API_KEY:-}"
SCHOLAR_MCP_CACHE_DIR: "/data/scholar-mcp"
SCHOLAR_MCP_READ_ONLY: "false"
volumes:
- scholar-mcp-data:/data/scholar-mcp
labels:
- "traefik.enable=true"
- "traefik.http.routers.scholar-mcp.rule=Host(`scholar-mcp.yourdomain.com`)"
docling-serve:
image: ghcr.io/ds4sd/docling-serve:latest
restart: unless-stopped
volumes:
scholar-mcp-data:
Cache Management
# Show cache statistics (row counts, database size)
scholar-mcp cache stats
# Clear all cached data (preserves identifier aliases)
scholar-mcp cache clear
# Remove entries older than 30 days
scholar-mcp cache clear --older-than 30
# Override cache directory
scholar-mcp cache stats --cache-dir /path/to/cache
Development
# Install with dev and MCP dependencies
uv sync --extra dev --extra mcp
# Run tests
uv run pytest
# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/
# Type check
uv run mypy src/
# Build docs locally
uv sync --extra docs
uv run mkdocs serve
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pvliesdonk_scholar_mcp-1.2.2.tar.gz.
File metadata
- Download URL: pvliesdonk_scholar_mcp-1.2.2.tar.gz
- Upload date:
- Size: 211.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8399da76774ec6c14b0d4ede77c4bfd8445774e25bb323a853ae8a35eb4bfc25
|
|
| MD5 |
4ccaaa14c319064d134416d8c6bdf526
|
|
| BLAKE2b-256 |
bbe1003a6d651f6844e9ce7fe1c0dd4e294b1fa84e914a2d43b4aa4d4923ff4a
|
Provenance
The following attestation bundles were made for pvliesdonk_scholar_mcp-1.2.2.tar.gz:
Publisher:
release.yml on pvliesdonk/scholar-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pvliesdonk_scholar_mcp-1.2.2.tar.gz -
Subject digest:
8399da76774ec6c14b0d4ede77c4bfd8445774e25bb323a853ae8a35eb4bfc25 - Sigstore transparency entry: 1236852658
- Sigstore integration time:
-
Permalink:
pvliesdonk/scholar-mcp@0163922b412233e3c7f94114e85c92b9eb22cf60 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/pvliesdonk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0163922b412233e3c7f94114e85c92b9eb22cf60 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pvliesdonk_scholar_mcp-1.2.2-py3-none-any.whl.
File metadata
- Download URL: pvliesdonk_scholar_mcp-1.2.2-py3-none-any.whl
- Upload date:
- Size: 41.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76f3ef9726f935b442f207bea200e6e62f2c7d133022191eff59d38aad537733
|
|
| MD5 |
d7219ece622f8133baa0475affcdae7d
|
|
| BLAKE2b-256 |
9eb4240aae7e93d608400c62f31c4aa63ebb9d1bdc00024e7e571d78bd77a85e
|
Provenance
The following attestation bundles were made for pvliesdonk_scholar_mcp-1.2.2-py3-none-any.whl:
Publisher:
release.yml on pvliesdonk/scholar-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pvliesdonk_scholar_mcp-1.2.2-py3-none-any.whl -
Subject digest:
76f3ef9726f935b442f207bea200e6e62f2c7d133022191eff59d38aad537733 - Sigstore transparency entry: 1236852690
- Sigstore integration time:
-
Permalink:
pvliesdonk/scholar-mcp@0163922b412233e3c7f94114e85c92b9eb22cf60 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/pvliesdonk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0163922b412233e3c7f94114e85c92b9eb22cf60 -
Trigger Event:
workflow_dispatch
-
Statement type: