Skip to main content

MCP Server for multi-provider academic search (Semantic Scholar, Crossref, OpenAlex, PubMed) with regex filtering and statistics

Project description

Academic Search MCP Server

An MCP (Model Context Protocol) server for searching, filtering, and exploring academic papers across multiple databases, with citation graph walking.

Providers: Semantic Scholar, Crossref, OpenAlex, PubMed

Features

  • Multi-provider search — unified interface across 4 academic databases
  • Universal regex filtering — regex post-filtering on all providers (not just Semantic Scholar)
  • Rich post-hoc filters — year range, citation count, journal, publication type, open access, author
  • Citation graph walk — random walk on the Semantic Scholar citation graph with backtracking, cross-edge detection, and topic-aware pruning
  • Text similarity — pure-Python TF cosine similarity for topic drift detection and most-similar candidate selection (no heavy dependencies)
  • Stats — per-query metadata on authors, years, and field coverage
  • Normalised output — all providers return data in the same schema

Tools

search_papers

Search papers by keyword with optional regex post-filtering on title and abstract.

Parameters: query, search_type, limit, regex_filter, regex_search_fields, match_mode, year_min, year_max, min_citation_count, max_citation_count, open_access_only, has_pdf, journal, exclude_journals, publication_types, exclude_publication_types, author, has_abstract, provider

search_by_author

Find papers by author name with the same post-hoc filters.

explore_citations

Walk the Semantic Scholar citation graph step by step, starting from a seed paper.

Parameter Default Description
seed_paper_id (required) S2 paper ID
num_steps 30 Max walk length
max_depth None Max hops from seed before forced backtrack
direction_choice "random" "forward", "backward", "alternating", or "random"
bias "random" "top_cited", "bottom_cited", "most_similar", or "random"
candidates_per_step 100 Candidates per API call
stop_similarity None 0.0–1.0 topic drift threshold

Includes the same post-hoc filters as search_papers.

get_paper_stats

Fetch papers and compute statistics: field availability, author counts, publication year distributions.

build_extended_query

Preview how a regex pattern gets transformed for a given provider's API.

Usage

Via uvx (recommended)

uvx academic-search

Via uv run

uv run --with academic-search academic-search

From source

git clone ...
cd academic-search
uv run academic-search

With Claude Desktop

{
  "mcpServers": {
    "academic-search": {
      "command": "uvx",
      "args": ["academic-search"]
    }
  }
}

Development

uv sync
uv run academic-search

Providers

Provider Citation graph Notes
Semantic Scholar Yes Full citation/reference graph endpoints, citation counts
Crossref No Good DOI coverage, no OA metadata
OpenAlex No Best OA metadata, abstracts from inverted index
PubMed No NIH literature

Design Notes

  • query and regex_filter are separate — the API query is plain text; regex filtering is a post-hoc step on all providers
  • Citation walk is a true graph walk — one step at a time, not bucket collection. Uses backtracking, cross-edge recording, and per-step filtering
  • Caching — API responses for (paper_id, direction) are cached per walk to avoid redundant fetches during backtracking
  • Text similarity — pure-Python TF cosine similarity (no sentence-transformers); used for most_similar bias and stop_similarity pruning
  • No API key required — all providers have free tiers, but Semantic Scholar rate limits are ~1 req/5s without a key. Set S2_API_KEY for up to 100 req/s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

academic_search-0.7.1.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

academic_search-0.7.1-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file academic_search-0.7.1.tar.gz.

File metadata

  • Download URL: academic_search-0.7.1.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for academic_search-0.7.1.tar.gz
Algorithm Hash digest
SHA256 306adc4b5a53a7bb73323d86f243c98ab9f8e9198bdd2ac0b3b96a508298360b
MD5 802d8288a71cf7e3c0653b3069e95bc4
BLAKE2b-256 ce11a66500dd3aae8bd004d7fd9c941be483d8110dbb1bab874662233f2d1ce5

See more details on using hashes here.

File details

Details for the file academic_search-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: academic_search-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for academic_search-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5a66c1853d6563c1a856b650e935cc93b17ec1c2dee6634d29dc9e4b6a36635d
MD5 8237463962477652977df0c1d8c84509
BLAKE2b-256 3dc92d41ddf87516a1ad62c449be22f660a515c1c689005a1925ac407ac26cc2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page