Skip to main content

FastMCP server for Semantic Scholar with OpenAlex enrichment and docling PDF conversion

Project description

scholar-mcp

CI PyPI Python License Docker Docs llms.txt

A FastMCP server providing structured academic literature access via Semantic Scholar, with OpenAlex enrichment and optional docling-serve PDF conversion.

Features

  • Search & retrieval -- full-text paper search with year, venue, field-of-study, and citation-count filters; single-paper lookup by DOI, S2 ID, arXiv ID, and more; author profile and name search
  • Citation graph -- forward citations, backward references, BFS graph traversal up to configurable depth, and shortest-path bridge paper discovery
  • Recommendations -- paper recommendations from positive (and optional negative) examples via the S2 recommendation API
  • Citation generation -- format paper metadata as BibTeX, CSL-JSON, or RIS citations with automatic entry type inference, author name parsing, and OpenAlex venue enrichment
  • OpenAlex enrichment -- augment paper metadata with open-access URLs, affiliations, funders, concepts, and OA status
  • PDF conversion -- download open-access PDFs and convert to Markdown via docling-serve, with optional VLM enrichment for formulas and figures
  • Intelligent caching -- SQLite-backed cache with per-table TTLs (30 days for papers/authors, 7 days for citations/references) and identifier aliasing
  • Authentication -- bearer token, OIDC (OAuth 2.1), or both simultaneously (multi-auth)
  • Multi-transport -- stdio (Claude Desktop), HTTP (streamable-http), and SSE transports
  • Linux packages -- .deb and .rpm packages with systemd service and security hardening

Installation

With uvx (recommended)

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve

With pip

pip install 'pvliesdonk-scholar-mcp[mcp]'
scholar-mcp serve

With Docker

docker run -v scholar-mcp-data:/data/scholar-mcp \
           ghcr.io/pvliesdonk/scholar-mcp:latest

Linux packages

Download .deb or .rpm from the latest release:

# Debian/Ubuntu
sudo dpkg -i scholar-mcp_*.deb

# RHEL/Fedora
sudo rpm -i scholar-mcp-*.rpm

Note: The PyPI package is pvliesdonk-scholar-mcp. The CLI command installed is scholar-mcp.

Quick Start

stdio transport (Claude Desktop / MCP clients)

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve

API key optional but recommended: The server works without a Semantic Scholar API key, but unauthenticated requests are limited to ~1 req/s and will hit 429 throttles quickly during multi-step operations like citation graph traversal. Request a free key to get ~10 req/s.

Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "scholar": {
      "command": "uvx",
      "args": ["--from", "pvliesdonk-scholar-mcp", "scholar-mcp", "serve"],
      "env": {
        "SCHOLAR_MCP_S2_API_KEY": "your-key"
      }
    }
  }
}

HTTP transport

uvx --from pvliesdonk-scholar-mcp scholar-mcp serve --transport http --port 8000

Configuration

All settings are controlled via environment variables with the SCHOLAR_MCP_ prefix.

Core

Variable Default Description
SCHOLAR_MCP_S2_API_KEY -- Semantic Scholar API key (request one); optional but recommended for higher rate limits
SCHOLAR_MCP_READ_ONLY true If true, write-tagged tools (fetch_paper_pdf, convert_pdf_to_markdown, fetch_and_convert) are hidden
SCHOLAR_MCP_CACHE_DIR /data/scholar-mcp Directory for the SQLite cache database and downloaded PDFs
SCHOLAR_MCP_CONTACT_EMAIL -- Included in the OpenAlex User-Agent for polite pool access (faster rate limits)
SCHOLAR_MCP_LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)

PDF Conversion (optional)

Variable Default Description
SCHOLAR_MCP_DOCLING_URL -- Base URL of a running docling-serve instance (e.g. http://localhost:5001)
SCHOLAR_MCP_VLM_API_URL -- OpenAI-compatible VLM endpoint for formula/figure-enriched PDF conversion
SCHOLAR_MCP_VLM_API_KEY -- API key for the VLM endpoint
SCHOLAR_MCP_VLM_MODEL gpt-4o Model name for VLM-enriched conversion

Authentication (optional)

Variable Default Description
SCHOLAR_MCP_BEARER_TOKEN -- Static bearer token for HTTP transport authentication
SCHOLAR_MCP_BASE_URL -- Public base URL, required for OIDC (e.g. https://mcp.example.com)
SCHOLAR_MCP_OIDC_CONFIG_URL -- OIDC discovery endpoint URL
SCHOLAR_MCP_OIDC_CLIENT_ID -- OIDC client ID
SCHOLAR_MCP_OIDC_CLIENT_SECRET -- OIDC client secret
SCHOLAR_MCP_OIDC_JWT_SIGNING_KEY -- JWT signing key; required on Linux/Docker to survive restarts (openssl rand -hex 32)

MCP Tools

Search & Retrieval

Tool Description
search_papers Full-text search with year, venue, field-of-study, and citation-count filters. Returns up to 100 results with pagination.
get_paper Fetch full metadata for a single paper by DOI, S2 ID, arXiv ID, ACM ID, or PubMed ID.
get_author Fetch author profile with publications, or search by name.

Citation Graph

Tool Description
get_citations Forward citations (papers that cite a given paper) with optional filters.
get_references Backward references (papers cited by a given paper).
get_citation_graph BFS traversal from seed papers, returning nodes + edges up to configurable depth.
find_bridge_papers Shortest citation path between two papers.

Recommendations

Tool Description
recommend_papers Paper recommendations from 1--5 positive examples and optional negative examples.

Utility

Tool Description
batch_resolve Resolve up to 100 identifiers to full metadata in one call, with OpenAlex fallback.
enrich_paper Augment S2 metadata with OpenAlex fields (affiliations, funders, OA status, concepts).

Citation Generation

Tool Description
generate_citations Generate BibTeX, CSL-JSON, or RIS citations for up to 100 papers, with automatic entry type inference and optional OpenAlex venue enrichment.

PDF Conversion (requires docling-serve)

Tool Description
fetch_paper_pdf Download open-access PDF for a paper.
convert_pdf_to_markdown Convert a local PDF to Markdown via docling-serve.
fetch_and_convert Full pipeline: fetch OA PDF, convert to Markdown, return both.

PDF tools are write-tagged and hidden when SCHOLAR_MCP_READ_ONLY=true (the default).

Task Polling

Tool Description
get_task_result Poll for the result of a background task by ID.
list_tasks List all active background tasks.

Long-running operations (PDF download/conversion) and rate-limited S2 requests return {"queued": true, "task_id": "..."} immediately. Use get_task_result to poll for the result.

Docker Compose

services:
  scholar-mcp:
    image: ghcr.io/pvliesdonk/scholar-mcp:latest
    restart: unless-stopped
    environment:
      SCHOLAR_MCP_S2_API_KEY: "${SCHOLAR_MCP_S2_API_KEY}"
      SCHOLAR_MCP_DOCLING_URL: "http://docling-serve:5001"
      SCHOLAR_MCP_VLM_API_URL: "${VLM_API_URL:-}"
      SCHOLAR_MCP_VLM_API_KEY: "${VLM_API_KEY:-}"
      SCHOLAR_MCP_CACHE_DIR: "/data/scholar-mcp"
      SCHOLAR_MCP_READ_ONLY: "false"
    volumes:
      - scholar-mcp-data:/data/scholar-mcp
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.scholar-mcp.rule=Host(`scholar-mcp.yourdomain.com`)"

  docling-serve:
    image: ghcr.io/ds4sd/docling-serve:latest
    restart: unless-stopped

volumes:
  scholar-mcp-data:

Cache Management

# Show cache statistics (row counts, database size)
scholar-mcp cache stats

# Clear all cached data (preserves identifier aliases)
scholar-mcp cache clear

# Remove entries older than 30 days
scholar-mcp cache clear --older-than 30

# Override cache directory
scholar-mcp cache stats --cache-dir /path/to/cache

Development

# Install with dev and MCP dependencies
uv sync --extra dev --extra mcp

# Run tests
uv run pytest

# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type check
uv run mypy src/

# Build docs locally
uv sync --extra docs
uv run mkdocs serve

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pvliesdonk_scholar_mcp-1.4.0.tar.gz (238.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pvliesdonk_scholar_mcp-1.4.0-py3-none-any.whl (51.2 kB view details)

Uploaded Python 3

File details

Details for the file pvliesdonk_scholar_mcp-1.4.0.tar.gz.

File metadata

  • Download URL: pvliesdonk_scholar_mcp-1.4.0.tar.gz
  • Upload date:
  • Size: 238.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pvliesdonk_scholar_mcp-1.4.0.tar.gz
Algorithm Hash digest
SHA256 5221641591662b288d16451d8dcc9815043b6b83fc1f02048412f3f3c8b180a8
MD5 166d6452183a0e9a3afe9bf473b087b1
BLAKE2b-256 b0cb7af04102f690ea5024c52963ebff9e24c720000e9d4555fe9559ed75869a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvliesdonk_scholar_mcp-1.4.0.tar.gz:

Publisher: release.yml on pvliesdonk/scholar-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pvliesdonk_scholar_mcp-1.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pvliesdonk_scholar_mcp-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 85e4a34b0de739fed49f8c1bdd47ff6fef53bd12c0bd4170ea0b9ee78e5844a7
MD5 5d7b5d98b48649ccb33d4b1d775e064b
BLAKE2b-256 b6e65ea384bf203f1eb49c64347d423890706be58ce55c71118b21a2a90534cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvliesdonk_scholar_mcp-1.4.0-py3-none-any.whl:

Publisher: release.yml on pvliesdonk/scholar-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page