MCP Server for multi-provider academic search (Semantic Scholar, Crossref, OpenAlex, PubMed) with regex filtering and statistics
Project description
Academic Search MCP Server
An MCP (Model Context Protocol) server for searching, filtering, and exploring academic papers across multiple databases, with citation graph walking.
Providers: Semantic Scholar, Crossref, OpenAlex, PubMed
Features
- Multi-provider search — unified interface across 4 academic databases
- Universal regex filtering — regex post-filtering on all providers (not just Semantic Scholar)
- Rich post-hoc filters — year range, citation count, journal, publication type, open access, author
- Citation graph walk — random walk on the Semantic Scholar citation graph with backtracking, cross-edge detection, and topic-aware pruning
- Text similarity — pure-Python TF cosine similarity for topic drift detection and most-similar candidate selection (no heavy dependencies)
- Stats — per-query metadata on authors, years, and field coverage
- Normalised output — all providers return data in the same schema
Tools
search_papers
Search papers by keyword with optional regex post-filtering on title and abstract.
Parameters: query, search_type, limit, regex_filter, regex_search_fields, match_mode, year_min, year_max, min_citation_count, max_citation_count, open_access_only, has_pdf, journal, exclude_journals, publication_types, exclude_publication_types, author, has_abstract, provider
search_by_author
Find papers by author name with the same post-hoc filters.
explore_citations
Walk the Semantic Scholar citation graph step by step, starting from a seed paper.
| Parameter | Default | Description |
|---|---|---|
seed_paper_id |
(required) | S2 paper ID |
num_steps |
30 | Max walk length |
max_depth |
None | Max hops from seed before forced backtrack |
direction_choice |
"random" |
"forward", "backward", "alternating", or "random" |
bias |
"random" |
"top_cited", "bottom_cited", "most_similar", or "random" |
candidates_per_step |
100 | Candidates per API call |
stop_similarity |
None | 0.0–1.0 topic drift threshold |
Includes the same post-hoc filters as search_papers.
get_paper_stats
Fetch papers and compute statistics: field availability, author counts, publication year distributions.
build_extended_query
Preview how a regex pattern gets transformed for a given provider's API.
Usage
Via uvx (recommended)
uvx academic-search
Via uv run
uv run --with academic-search academic-search
From source
git clone ...
cd academic-search
uv run academic-search
With Claude Desktop
{
"mcpServers": {
"academic-search": {
"command": "uvx",
"args": ["academic-search"]
}
}
}
Development
uv sync
uv run academic-search
Providers
| Provider | Citation graph | Notes |
|---|---|---|
| Semantic Scholar | Yes | Full citation/reference graph endpoints, citation counts |
| Crossref | No | Good DOI coverage, no OA metadata |
| OpenAlex | No | Best OA metadata, abstracts from inverted index |
| PubMed | No | NIH literature |
Design Notes
queryandregex_filterare separate — the API query is plain text; regex filtering is a post-hoc step on all providers- Citation walk is a true graph walk — one step at a time, not bucket collection. Uses backtracking, cross-edge recording, and per-step filtering
- Caching — API responses for
(paper_id, direction)are cached per walk to avoid redundant fetches during backtracking - Text similarity — pure-Python TF cosine similarity (no sentence-transformers); used for
most_similarbias andstop_similaritypruning - No API key required — all providers have free tiers, but Semantic Scholar rate limits are ~1 req/5s without a key. Set
S2_API_KEYfor up to 100 req/s
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file academic_search-0.7.2.tar.gz.
File metadata
- Download URL: academic_search-0.7.2.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f94d063a8140a9c5d03a7eb4b1868c4bba57a8ba35ca000ee3eaa0b65c9c4027
|
|
| MD5 |
8b623e8b37edd930baf35781380ad967
|
|
| BLAKE2b-256 |
64bf8870c363e0c269e94580e29cb0e7b4cb0d5c6547c0480d31867ce685d9a3
|
File details
Details for the file academic_search-0.7.2-py3-none-any.whl.
File metadata
- Download URL: academic_search-0.7.2-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
710e122909c970c3fc49499eee2b83644bad448390152e3a4d1e81288874ecbc
|
|
| MD5 |
7fbb5485cf94e03aee5c4ce8b407d372
|
|
| BLAKE2b-256 |
abb10495bdb5b3471cd947fc0405e36262b112a6a68fc96e0377825b3e3ec650
|