Skip to main content

MCP Server for multi-provider academic search (Semantic Scholar, Crossref, OpenAlex, PubMed) with regex filtering and statistics

Project description

Academic Search MCP Server

An MCP (Model Context Protocol) server for searching, filtering, and exploring academic papers across multiple databases, with citation graph walking.

Providers: Semantic Scholar, Crossref, OpenAlex, PubMed

Features

  • Multi-provider search — unified interface across 4 academic databases
  • Universal regex filtering — regex post-filtering on all providers (not just Semantic Scholar)
  • Rich post-hoc filters — year range, citation count, journal, publication type, open access, author
  • Citation graph walk — random walk on the Semantic Scholar citation graph with backtracking, cross-edge detection, and topic-aware pruning
  • Text similarity — pure-Python TF cosine similarity for topic drift detection and most-similar candidate selection (no heavy dependencies)
  • Stats — per-query metadata on authors, years, and field coverage
  • Normalised output — all providers return data in the same schema

Tools

search_papers

Search papers by keyword with optional regex post-filtering on title and abstract.

Parameters: query, search_type, limit, regex_filter, regex_search_fields, match_mode, year_min, year_max, min_citation_count, max_citation_count, open_access_only, has_pdf, journal, exclude_journals, publication_types, exclude_publication_types, author, has_abstract, provider

search_by_author

Find papers by author name with the same post-hoc filters.

explore_citations

Walk the Semantic Scholar citation graph step by step, starting from a seed paper.

Parameter Default Description
seed_paper_id (required) S2 paper ID
num_steps 30 Max walk length
max_depth None Max hops from seed before forced backtrack
direction_choice "random" "forward", "backward", "alternating", or "random"
bias "random" "top_cited", "bottom_cited", "most_similar", or "random"
candidates_per_step 100 Candidates per API call
stop_similarity None 0.0–1.0 topic drift threshold

Includes the same post-hoc filters as search_papers.

get_paper_stats

Fetch papers and compute statistics: field availability, author counts, publication year distributions.

build_extended_query

Preview how a regex pattern gets transformed for a given provider's API.

Usage

Via uvx (recommended)

uvx academic-search

Via uv run

uv run --with academic-search academic-search

From source

git clone ...
cd academic-search
uv run academic-search

With Claude Desktop

{
  "mcpServers": {
    "academic-search": {
      "command": "uvx",
      "args": ["academic-search"]
    }
  }
}

Development

uv sync
uv run academic-search

Providers

Provider Citation graph Notes
Semantic Scholar Yes Full citation/reference graph endpoints, citation counts
Crossref No Good DOI coverage, no OA metadata
OpenAlex No Best OA metadata, abstracts from inverted index
PubMed No NIH literature

Design Notes

  • query and regex_filter are separate — the API query is plain text; regex filtering is a post-hoc step on all providers
  • Citation walk is a true graph walk — one step at a time, not bucket collection. Uses backtracking, cross-edge recording, and per-step filtering
  • Caching — API responses for (paper_id, direction) are cached per walk to avoid redundant fetches during backtracking
  • Text similarity — pure-Python TF cosine similarity (no sentence-transformers); used for most_similar bias and stop_similarity pruning
  • No API key required — all providers have free tiers, but Semantic Scholar rate limits are ~1 req/5s without a key. Set S2_API_KEY for up to 100 req/s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

academic_search-0.7.2.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

academic_search-0.7.2-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file academic_search-0.7.2.tar.gz.

File metadata

  • Download URL: academic_search-0.7.2.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for academic_search-0.7.2.tar.gz
Algorithm Hash digest
SHA256 f94d063a8140a9c5d03a7eb4b1868c4bba57a8ba35ca000ee3eaa0b65c9c4027
MD5 8b623e8b37edd930baf35781380ad967
BLAKE2b-256 64bf8870c363e0c269e94580e29cb0e7b4cb0d5c6547c0480d31867ce685d9a3

See more details on using hashes here.

File details

Details for the file academic_search-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: academic_search-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for academic_search-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 710e122909c970c3fc49499eee2b83644bad448390152e3a4d1e81288874ecbc
MD5 7fbb5485cf94e03aee5c4ce8b407d372
BLAKE2b-256 abb10495bdb5b3471cd947fc0405e36262b112a6a68fc96e0377825b3e3ec650

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page