Skip to main content

FastMCP server for Semantic Scholar with OpenAlex enrichment and docling PDF conversion

Project description

scholar-mcp

CI PyPI Python License Docker Docs llms.txt

A FastMCP server providing structured academic literature access via Semantic Scholar, with OpenAlex enrichment and optional docling-serve PDF conversion.

Features

  • Search & retrieval -- full-text paper search with year, venue, field-of-study, and citation-count filters; single-paper lookup by DOI, S2 ID, arXiv ID, and more; author profile and name search
  • Citation graph -- forward citations, backward references, BFS graph traversal up to configurable depth, and shortest-path bridge paper discovery
  • Recommendations -- paper recommendations from positive (and optional negative) examples via the S2 recommendation API
  • OpenAlex enrichment -- augment paper metadata with open-access URLs, affiliations, funders, concepts, and OA status
  • PDF conversion -- download open-access PDFs and convert to Markdown via docling-serve, with optional VLM enrichment for formulas and figures
  • Intelligent caching -- SQLite-backed cache with per-table TTLs (30 days for papers/authors, 7 days for citations/references) and identifier aliasing
  • Authentication -- bearer token, OIDC (OAuth 2.1), or both simultaneously (multi-auth)
  • Multi-transport -- stdio (Claude Desktop), HTTP (streamable-http), and SSE transports
  • Linux packages -- .deb and .rpm packages with systemd service and security hardening

Installation

With uvx (recommended)

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve

With pip

pip install 'pvliesdonk-scholar-mcp[mcp]'
scholar-mcp serve

With Docker

docker run -v scholar-mcp-data:/data/scholar-mcp \
           ghcr.io/pvliesdonk/scholar-mcp:latest

Linux packages

Download .deb or .rpm from the latest release:

# Debian/Ubuntu
sudo dpkg -i scholar-mcp_*.deb

# RHEL/Fedora
sudo rpm -i scholar-mcp-*.rpm

Note: The PyPI package is pvliesdonk-scholar-mcp. The CLI command installed is scholar-mcp.

Quick Start

stdio transport (Claude Desktop / MCP clients)

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve

API key optional but recommended: The server works without a Semantic Scholar API key, but unauthenticated requests are limited to ~1 req/s and will hit 429 throttles quickly during multi-step operations like citation graph traversal. Request a free key to get ~10 req/s.

Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "scholar": {
      "command": "uvx",
      "args": ["--from", "pvliesdonk-scholar-mcp", "scholar-mcp", "serve"],
      "env": {
        "SCHOLAR_MCP_S2_API_KEY": "your-key"
      }
    }
  }
}

HTTP transport

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve --transport http --port 8000

Configuration

All settings are controlled via environment variables with the SCHOLAR_MCP_ prefix.

Core

Variable Default Description
SCHOLAR_MCP_S2_API_KEY -- Semantic Scholar API key (request one); optional but recommended for higher rate limits
SCHOLAR_MCP_READ_ONLY true If true, write-tagged tools (fetch_paper_pdf, convert_pdf_to_markdown, fetch_and_convert) are hidden
SCHOLAR_MCP_CACHE_DIR /data/scholar-mcp Directory for the SQLite cache database and downloaded PDFs
SCHOLAR_MCP_CONTACT_EMAIL -- Included in the OpenAlex User-Agent for polite pool access (faster rate limits)
SCHOLAR_MCP_LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)

PDF Conversion (optional)

Variable Default Description
SCHOLAR_MCP_DOCLING_URL -- Base URL of a running docling-serve instance (e.g. http://localhost:5001)
SCHOLAR_MCP_VLM_API_URL -- OpenAI-compatible VLM endpoint for formula/figure-enriched PDF conversion
SCHOLAR_MCP_VLM_API_KEY -- API key for the VLM endpoint
SCHOLAR_MCP_VLM_MODEL gpt-4o Model name for VLM-enriched conversion

Authentication (optional)

Variable Default Description
SCHOLAR_MCP_BEARER_TOKEN -- Static bearer token for HTTP transport authentication
SCHOLAR_MCP_BASE_URL -- Public base URL, required for OIDC (e.g. https://mcp.example.com)
SCHOLAR_MCP_OIDC_CONFIG_URL -- OIDC discovery endpoint URL
SCHOLAR_MCP_OIDC_CLIENT_ID -- OIDC client ID
SCHOLAR_MCP_OIDC_CLIENT_SECRET -- OIDC client secret
SCHOLAR_MCP_OIDC_JWT_SIGNING_KEY -- JWT signing key; required on Linux/Docker to survive restarts (openssl rand -hex 32)

MCP Tools

Search & Retrieval

Tool Description
search_papers Full-text search with year, venue, field-of-study, and citation-count filters. Returns up to 100 results with pagination.
get_paper Fetch full metadata for a single paper by DOI, S2 ID, arXiv ID, ACM ID, or PubMed ID.
get_author Fetch author profile with publications, or search by name.

Citation Graph

Tool Description
get_citations Forward citations (papers that cite a given paper) with optional filters.
get_references Backward references (papers cited by a given paper).
get_citation_graph BFS traversal from seed papers, returning nodes + edges up to configurable depth.
find_bridge_papers Shortest citation path between two papers.

Recommendations

Tool Description
recommend_papers Paper recommendations from 1--5 positive examples and optional negative examples.

Utility

Tool Description
batch_resolve Resolve up to 100 identifiers to full metadata in one call, with OpenAlex fallback.
enrich_paper Augment S2 metadata with OpenAlex fields (affiliations, funders, OA status, concepts).

PDF Conversion (requires docling-serve)

Tool Description
fetch_paper_pdf Download open-access PDF for a paper.
convert_pdf_to_markdown Convert a local PDF to Markdown via docling-serve.
fetch_and_convert Full pipeline: fetch OA PDF, convert to Markdown, return both.

PDF tools are write-tagged and hidden when SCHOLAR_MCP_READ_ONLY=true (the default).

Task Polling

Tool Description
get_task_result Poll for the result of a background task by ID.
list_tasks List all active background tasks.

Long-running operations (PDF download/conversion) and rate-limited S2 requests return {"queued": true, "task_id": "..."} immediately. Use get_task_result to poll for the result.

Docker Compose

services:
  scholar-mcp:
    image: ghcr.io/pvliesdonk/scholar-mcp:latest
    restart: unless-stopped
    environment:
      SCHOLAR_MCP_S2_API_KEY: "${SCHOLAR_MCP_S2_API_KEY}"
      SCHOLAR_MCP_DOCLING_URL: "http://docling-serve:5001"
      SCHOLAR_MCP_VLM_API_URL: "${VLM_API_URL:-}"
      SCHOLAR_MCP_VLM_API_KEY: "${VLM_API_KEY:-}"
      SCHOLAR_MCP_CACHE_DIR: "/data/scholar-mcp"
      SCHOLAR_MCP_READ_ONLY: "false"
    volumes:
      - scholar-mcp-data:/data/scholar-mcp
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.scholar-mcp.rule=Host(`scholar-mcp.yourdomain.com`)"

  docling-serve:
    image: ghcr.io/ds4sd/docling-serve:latest
    restart: unless-stopped

volumes:
  scholar-mcp-data:

Cache Management

# Show cache statistics (row counts, database size)
scholar-mcp cache stats

# Clear all cached data (preserves identifier aliases)
scholar-mcp cache clear

# Remove entries older than 30 days
scholar-mcp cache clear --older-than 30

# Override cache directory
scholar-mcp cache stats --cache-dir /path/to/cache

Development

# Install with dev and MCP dependencies
uv sync --extra dev --extra mcp

# Run tests
uv run pytest

# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type check
uv run mypy src/

# Build docs locally
uv sync --extra docs
uv run mkdocs serve

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pvliesdonk_scholar_mcp-1.2.2.tar.gz (211.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pvliesdonk_scholar_mcp-1.2.2-py3-none-any.whl (41.7 kB view details)

Uploaded Python 3

File details

Details for the file pvliesdonk_scholar_mcp-1.2.2.tar.gz.

File metadata

  • Download URL: pvliesdonk_scholar_mcp-1.2.2.tar.gz
  • Upload date:
  • Size: 211.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pvliesdonk_scholar_mcp-1.2.2.tar.gz
Algorithm Hash digest
SHA256 8399da76774ec6c14b0d4ede77c4bfd8445774e25bb323a853ae8a35eb4bfc25
MD5 4ccaaa14c319064d134416d8c6bdf526
BLAKE2b-256 bbe1003a6d651f6844e9ce7fe1c0dd4e294b1fa84e914a2d43b4aa4d4923ff4a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvliesdonk_scholar_mcp-1.2.2.tar.gz:

Publisher: release.yml on pvliesdonk/scholar-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pvliesdonk_scholar_mcp-1.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for pvliesdonk_scholar_mcp-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 76f3ef9726f935b442f207bea200e6e62f2c7d133022191eff59d38aad537733
MD5 d7219ece622f8133baa0475affcdae7d
BLAKE2b-256 9eb4240aae7e93d608400c62f31c4aa63ebb9d1bdc00024e7e571d78bd77a85e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvliesdonk_scholar_mcp-1.2.2-py3-none-any.whl:

Publisher: release.yml on pvliesdonk/scholar-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page