Skip to main content

FastMCP server for Semantic Scholar with OpenAlex enrichment and docling PDF conversion

Project description

scholar-mcp

CI PyPI Python License Docker Docs llms.txt

A FastMCP server providing structured academic literature access via Semantic Scholar, with OpenAlex enrichment and optional docling-serve PDF conversion.

Features

  • Search & retrieval -- full-text paper search with year, venue, field-of-study, and citation-count filters; single-paper lookup by DOI, S2 ID, arXiv ID, and more; author profile and name search
  • Citation graph -- forward citations, backward references, BFS graph traversal up to configurable depth, and shortest-path bridge paper discovery
  • Recommendations -- paper recommendations from positive (and optional negative) examples via the S2 recommendation API
  • Citation generation -- format paper metadata as BibTeX, CSL-JSON, or RIS citations with automatic entry type inference, author name parsing, and OpenAlex venue enrichment
  • Book search -- search and fetch book metadata via Open Library (no API key required); papers with an ISBN are automatically enriched with publisher, edition, cover URL, and subject data
  • OpenAlex enrichment -- augment paper metadata with open-access URLs, affiliations, funders, concepts, and OA status
  • Patent search & cross-referencing -- search and retrieve patents via EPO Open Patent Services covering 100+ patent offices, with cited reference extraction, NPL-to-paper resolution via Semantic Scholar, and paper-to-patent citation discovery; EPO credentials are optional -- paper search works without them
  • PDF conversion -- download open-access PDFs and convert to Markdown via docling-serve, with optional VLM enrichment for formulas and figures; automatic fallback to ArXiv, PubMed Central, and Unpaywall when Semantic Scholar has no OA link; direct URL download for PDFs found elsewhere
  • Intelligent caching -- SQLite-backed cache with per-table TTLs (30 days for papers/authors, 7 days for citations/references) and identifier aliasing
  • Authentication -- bearer token, OIDC (OAuth 2.1), or both simultaneously (multi-auth)
  • Multi-transport -- stdio (Claude Desktop), HTTP (streamable-http), and SSE transports
  • Linux packages -- .deb and .rpm packages with systemd service and security hardening

Installation

With uvx (recommended)

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve

With pip

pip install 'pvliesdonk-scholar-mcp[mcp]'
scholar-mcp serve

With Docker

docker run -v scholar-mcp-data:/data/scholar-mcp \
           ghcr.io/pvliesdonk/scholar-mcp:latest

Linux packages

Download .deb or .rpm from the latest release:

# Debian/Ubuntu
sudo dpkg -i scholar-mcp_*.deb

# RHEL/Fedora
sudo rpm -i scholar-mcp-*.rpm

Note: The PyPI package is pvliesdonk-scholar-mcp. The CLI command installed is scholar-mcp.

Quick Start

stdio transport (Claude Desktop / MCP clients)

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve

API key optional but recommended: The server works without a Semantic Scholar API key, but unauthenticated requests are limited to ~1 req/s and will hit 429 throttles quickly during multi-step operations like citation graph traversal. Request a free key to get ~10 req/s.

Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "scholar": {
      "command": "uvx",
      "args": ["--from", "pvliesdonk-scholar-mcp", "scholar-mcp", "serve"],
      "env": {
        "SCHOLAR_MCP_S2_API_KEY": "your-key"
      }
    }
  }
}

HTTP transport

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve --transport http --port 8000

Configuration

All settings are controlled via environment variables with the SCHOLAR_MCP_ prefix.

Core

Variable Default Description
SCHOLAR_MCP_S2_API_KEY -- Semantic Scholar API key (request one); optional but recommended for higher rate limits
SCHOLAR_MCP_READ_ONLY true If true, write-tagged tools (fetch_paper_pdf, convert_pdf_to_markdown, fetch_and_convert, fetch_pdf_by_url) are hidden
SCHOLAR_MCP_CACHE_DIR /data/scholar-mcp Directory for the SQLite cache database and downloaded PDFs
SCHOLAR_MCP_CONTACT_EMAIL -- Included in the OpenAlex User-Agent for polite pool access (faster rate limits); also enables Unpaywall PDF lookups
SCHOLAR_MCP_LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)

PDF Conversion (optional)

Variable Default Description
SCHOLAR_MCP_DOCLING_URL -- Base URL of a running docling-serve instance (e.g. http://localhost:5001)
SCHOLAR_MCP_VLM_API_URL -- OpenAI-compatible VLM endpoint for formula/figure-enriched PDF conversion
SCHOLAR_MCP_VLM_API_KEY -- API key for the VLM endpoint
SCHOLAR_MCP_VLM_MODEL gpt-4o Model name for VLM-enriched conversion

Patent Search (optional)

Variable Default Description
SCHOLAR_MCP_EPO_CONSUMER_KEY -- EPO OPS consumer key (register at developers.epo.org); both key and secret must be set for patent tools to appear
SCHOLAR_MCP_EPO_CONSUMER_SECRET -- EPO OPS consumer secret

Authentication (optional)

Variable Default Description
SCHOLAR_MCP_BEARER_TOKEN -- Static bearer token for HTTP transport authentication
SCHOLAR_MCP_BASE_URL -- Public base URL, required for OIDC (e.g. https://mcp.example.com)
SCHOLAR_MCP_OIDC_CONFIG_URL -- OIDC discovery endpoint URL
SCHOLAR_MCP_OIDC_CLIENT_ID -- OIDC client ID
SCHOLAR_MCP_OIDC_CLIENT_SECRET -- OIDC client secret
SCHOLAR_MCP_OIDC_JWT_SIGNING_KEY -- JWT signing key; required on Linux/Docker to survive restarts (openssl rand -hex 32)

MCP Tools

Search & Retrieval

Tool Description
search_papers Full-text search with year, venue, field-of-study, and citation-count filters. Returns up to 100 results with pagination.
get_paper Fetch full metadata for a single paper by DOI, S2 ID, arXiv ID, ACM ID, or PubMed ID.
get_author Fetch author profile with publications, or search by name.

Citation Graph

Tool Description
get_citations Forward citations (papers that cite a given paper) with optional filters.
get_references Backward references (papers cited by a given paper).
get_citation_graph BFS traversal from seed papers, returning nodes + edges up to configurable depth.
find_bridge_papers Shortest citation path between two papers.

Recommendations

Tool Description
recommend_papers Paper recommendations from 1--5 positive examples and optional negative examples.

Book Search

Tool Description
search_books Search for books by title, author, ISBN, or keywords via Open Library. Returns up to 50 results.
get_book Fetch book metadata by ISBN-10, ISBN-13, Open Library work ID (OL...W), or edition ID (OL...M).
recommend_books Recommend books for a subject via Open Library, sorted by popularity.

Papers with an ISBN in their externalIds are automatically enriched with book_metadata (publisher, edition, cover URL, subjects, and more) from Open Library when fetched via get_paper, get_citations, get_references, or get_citation_graph.

Utility

Tool Description
batch_resolve Resolve up to 100 identifiers to full metadata in one call, with OpenAlex fallback.
enrich_paper Augment S2 metadata with OpenAlex fields (affiliations, funders, OA status, concepts).

Citation Generation

Tool Description
generate_citations Generate BibTeX, CSL-JSON, or RIS citations for up to 100 papers, with automatic entry type inference and optional OpenAlex venue enrichment.

Patent Search (requires EPO OPS credentials)

Tool Description
search_patents Search patents across 100+ patent offices via EPO OPS.
get_patent Fetch bibliographic metadata for a single patent by publication number.

Patent tools are hidden when SCHOLAR_MCP_EPO_CONSUMER_KEY and SCHOLAR_MCP_EPO_CONSUMER_SECRET are not set.

PDF Conversion (requires docling-serve)

Tool Description
fetch_paper_pdf Download PDF for a paper (S2 open-access, then ArXiv/PMC/Unpaywall fallback).
convert_pdf_to_markdown Convert a local PDF to Markdown via docling-serve.
fetch_and_convert Full pipeline: fetch PDF (with fallback), convert to Markdown, return both.
fetch_pdf_by_url Download a PDF from any URL and optionally convert to Markdown.

PDF tools are write-tagged and hidden when SCHOLAR_MCP_READ_ONLY=true (the default).

Task Polling

Tool Description
get_task_result Poll for the result of a background task by ID.
list_tasks List all active background tasks.

Long-running operations (PDF download/conversion) and rate-limited S2 requests return {"queued": true, "task_id": "..."} immediately. Use get_task_result to poll for the result.

Docker Compose

services:
  scholar-mcp:
    image: ghcr.io/pvliesdonk/scholar-mcp:latest
    restart: unless-stopped
    environment:
      SCHOLAR_MCP_S2_API_KEY: "${SCHOLAR_MCP_S2_API_KEY}"
      SCHOLAR_MCP_DOCLING_URL: "http://docling-serve:5001"
      SCHOLAR_MCP_VLM_API_URL: "${VLM_API_URL:-}"
      SCHOLAR_MCP_VLM_API_KEY: "${VLM_API_KEY:-}"
      SCHOLAR_MCP_CACHE_DIR: "/data/scholar-mcp"
      SCHOLAR_MCP_READ_ONLY: "false"
    volumes:
      - scholar-mcp-data:/data/scholar-mcp
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.scholar-mcp.rule=Host(`scholar-mcp.yourdomain.com`)"

  docling-serve:
    image: ghcr.io/ds4sd/docling-serve:latest
    restart: unless-stopped

volumes:
  scholar-mcp-data:

Cache Management

# Show cache statistics (row counts, database size)
scholar-mcp cache stats

# Clear all cached data (preserves identifier aliases)
scholar-mcp cache clear

# Remove entries older than 30 days
scholar-mcp cache clear --older-than 30

# Override cache directory
scholar-mcp cache stats --cache-dir /path/to/cache

Development

# Install with dev and MCP dependencies
uv sync --extra dev --extra mcp

# Run tests
uv run pytest

# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type check
uv run mypy src/

# Build docs locally
uv sync --extra docs
uv run mkdocs serve

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pvliesdonk_scholar_mcp-1.6.0.tar.gz (370.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pvliesdonk_scholar_mcp-1.6.0-py3-none-any.whl (84.8 kB view details)

Uploaded Python 3

File details

Details for the file pvliesdonk_scholar_mcp-1.6.0.tar.gz.

File metadata

  • Download URL: pvliesdonk_scholar_mcp-1.6.0.tar.gz
  • Upload date:
  • Size: 370.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pvliesdonk_scholar_mcp-1.6.0.tar.gz
Algorithm Hash digest
SHA256 340c1812fba88f761123a573c8cb44e1d0f3a7977622846469512e1da0f3ac31
MD5 077bb456b87750730266808728a9e8a3
BLAKE2b-256 7fa6974344260979e404752c1cf6fea772aed30d5781d5a05d22cf80ad0cdf8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvliesdonk_scholar_mcp-1.6.0.tar.gz:

Publisher: release.yml on pvliesdonk/scholar-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pvliesdonk_scholar_mcp-1.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pvliesdonk_scholar_mcp-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0d96fdf9a0da69db23380f3d5ddf051b50de60f5870f5913848151cc20770922
MD5 6feb239498bb76b7322f4f465a6d44c6
BLAKE2b-256 e940bf46171ba1d6c183b546303475943189973721902c6ad98360106edd9ed7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvliesdonk_scholar_mcp-1.6.0-py3-none-any.whl:

Publisher: release.yml on pvliesdonk/scholar-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page