An MCP server exposing arXiv research tools (search, abstracts, author lookup, trending) to LLM agents.
Project description
๐ arXiv Research MCP Server
Give any LLM agent a research librarian for arXiv.
Search 2.4M+ papers, pull full abstracts, track a researcher's latest work, and surface what a field is publishing right now โ all over the Model Context Protocol.
๐ฌ Demo
โถ๏ธ Demo GIF coming soon โ a 30-second walkthrough of an agent searching arXiv and reading an abstract through these tools.
โจ Why this server
Large language models are great at reasoning about papers but have no live access to the literature. This server closes that gap with four focused, read-only tools that an agent can call to discover, read, and monitor research on arXiv โ with output shaped specifically for an LLM's context window.
- ๐ง Agent-first tool design โ every tool carries a detailed docstring the host shows to the model, so it knows when and how to call each one.
- ๐ฆ Structured, validated output โ each tool returns a typed Pydantic model,
surfaced as MCP
structuredContent(not just a blob of text). - ๐๏ธ Context-aware verbosity โ
concisemode (default) trims abstracts and caps author lists;detailedreturns everything. You never blow the window by accident. - ๐ Honest by design โ
trending_topicsrefuses to fake popularity metrics arXiv doesn't expose, and says so in every response. - โ
Actually verified โ ruff + pyright
--strict+ an in-process smoke test and a real end-to-end stdio MCP client test, all green against the live API.
๐งฐ The four tools
| Tool | What it's for | Parameters (defaults) |
|---|---|---|
๐ search_papers |
Keyword discovery across all of arXiv. Supports field prefixes (ti:, au:, abs:, cat:) and boolean AND / OR / ANDNOT. |
query, max_results=10, sort_by="relevance", response_format="concise" |
๐ get_abstract |
Full record for one paper by ID โ untruncated abstract, every author, all categories, DOI / journal ref / comment, PDF + abstract URLs. | arxiv_id |
๐ค find_by_author |
A researcher's most recent papers, newest first. | author_name, max_results=10, response_format="concise" |
๐ trending_topics |
Recent submissions in a category within a time window, plus the sub-topics that dominate them. | category, days=7, max_results=10, response_format="concise" |
Shared conventions
response_format:"concise"(default) shortens the abstract to ~280 chars and caps the author list to 8 names โabstract_truncatedandauthor_countalways tell the agent what was elided."detailed"returns full text and all authors.sort_by(search only):"relevance","newest", or"last_updated".- Safety caps (auto-applied, and reported back in a
notefield):max_resultsis clamped to 50,trending_topicsscans at most 200 recent papers and honors a window of 1โ90 days. arxiv_idis forgiving โ it accepts bare (2401.01234), versioned (2401.01234v2), legacy (math.GT/0309136), and full-URL forms.
A deliberate note on "trending"
The arXiv API exposes no citation, download, or view counts โ so genuine popularity cannot be measured.
trending_topicstherefore defines "trending" as recency of submission within the window, and ranks the sub-categories those recent papers co-occur in. Every response restates this in itsnotefield so the agent never overclaims. Honesty over vanity metrics.
๐ Quick start
Install from PyPI:
pip install arxiv-research-mcp
โฆthen point your MCP client at the arxiv-research-mcp command (see
Connect it to an MCP host).
Or install from source
git clone https://github.com/JananiV07/arxiv-mcp-server.git
cd arxiv-mcp-server
python -m venv .venv
# Windows (PowerShell):
.venv\Scripts\Activate.ps1
# macOS / Linux:
source .venv/bin/activate
pip install -r requirements.txt
python src/server.py
Requires Python 3.10+. Runtime deps are just
mcp[cli]andarxiv. The PyPI package is namedarxiv-research-mcp(the namearxiv-mcp-serverwas already taken by an unrelated project).
Run it directly (it speaks MCP over stdio, so normally a host launches it):
python src/server.py
๐ Connect it to an MCP host
Configure your client
Add an entry to your client's MCP config file (for example, Claude Desktop uses
claude_desktop_config.json; other clients expose an equivalent).
If you installed from PyPI (pip install arxiv-research-mcp), just reference
the installed command:
{
"mcpServers": {
"arxiv-research": {
"command": "arxiv-research-mcp"
}
}
}
If you installed from source, point at the Python interpreter from your virtual environment:
{
"mcpServers": {
"arxiv-research": {
"command": "/absolute/path/to/arxiv-mcp-server/.venv/bin/python",
"args": ["/absolute/path/to/arxiv-mcp-server/src/server.py"]
}
}
}
On Windows (from source), use the .exe and forward slashes โ e.g.
C:/path/to/arxiv-mcp-server/.venv/Scripts/python.exe.
Restart the host, and the four tools appear under the arxiv-research server.
Try it with the MCP Inspector
npx @modelcontextprotocol/inspector python src/server.py
๐ฌ What an agent can do with it
Once connected, natural-language requests map cleanly onto the tools:
| You askโฆ | The agent callsโฆ |
|---|---|
| "Find recent papers on diffusion models for video." | search_papers("ti:diffusion AND cat:cs.CV", sort_by="newest") |
| "Summarize 'Attention Is All You Need'." | get_abstract("1706.03762") |
| "What has Yoshua Bengio published lately?" | find_by_author("Yoshua Bengio") |
| "What's hot in machine learning this week?" | trending_topics("cs.LG", days=7) |
Example output (get_abstract, abridged)
{
"arxiv_id": "1706.03762v7",
"title": "Attention Is All You Need",
"authors": ["Ashish Vaswani", "Noam Shazeer", "..."],
"author_count": 8,
"published": "2017-06-12",
"updated": "2023-08-02",
"primary_category": "cs.CL",
"categories": ["cs.CL", "cs.LG"],
"abstract": "The dominant sequence transduction models ...",
"abstract_truncated": false,
"abstract_url": "http://arxiv.org/abs/1706.03762v7",
"pdf_url": "https://arxiv.org/pdf/1706.03762v7"
}
๐๏ธ Architecture & design choices
arxiv-mcp-server/
โโโ src/
โ โโโ server.py # FastMCP server: 4 tools + Pydantic models + helpers
โโโ scripts/
โ โโโ smoke_test.py # in-process tests (import the tool fns directly)
โ โโโ client_test.py # end-to-end test over the real stdio MCP protocol
โโโ pyproject.toml # packaging + ruff + pyright config
โโโ requirements.txt # runtime deps
โโโ README.md
- FastMCP registers each tool via
@mcp.tool(); type hints +pydantic.Fielddescriptions become the JSON input schema the host advertises to the model. - Typed output models โ
Paper,SearchResults,AuthorResults,TopicCount,TrendingResultsโ give the host structured, machine-readable results. - Read-only annotations โ all four tools set
readOnlyHint=True/destructiveHint=False, so hosts can treat them as safe to call freely. - One shared
arxiv.Clientwith a polite delay + retries, respecting arXiv's fair-use guidance; its chatty INFO logging is silenced so stdout stays a clean MCP channel. - Actionable errors โ bad input or a failed request raises a
ValueErrorwhose message tells the agent how to fix the call (correct ID format, valid category code, query-prefix syntax, โฆ).
๐งช Development & testing
pip install -e ".[dev]" # ruff + pyright
ruff check . # lint
pyright # type check (strict on our own code)
python scripts/smoke_test.py # in-process checks vs the live arXiv API
python scripts/client_test.py # full stdio MCP protocol round-trip
Two complementary test layers:
smoke_test.pyimports the tool functions directly โ fast feedback on tool logic, the concise/detailed split,max_results/daysclamping, missing-field handling, and error paths.client_test.pyis a true MCP client: it spawnssrc/server.pyas a subprocess and exercisesinitialize โ list_tools โ call_toolover stdio โ the same path any MCP host uses. This is what proves the server works as an MCP server: input schemas,structuredContent, tool annotations, and protocol-level error reporting (isError).
๐ Requirements
- Python 3.10+
mcp[cli]โ the MCP Python SDK (FastMCP)arxivโ Python wrapper for the arXiv API- Network access to
export.arxiv.org
๐ Acknowledgements
- Paper data from the arXiv API. Thank you to arXiv for the open API โ please use it within their Terms of Use.
- Built on the Model Context Protocol.
arXiv is a trademark of Cornell University. This project is an independent, unofficial integration and is not affiliated with or endorsed by arXiv.
๐ License
Released under the MIT License โ see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arxiv_research_mcp-1.0.0.tar.gz.
File metadata
- Download URL: arxiv_research_mcp-1.0.0.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9988e390baaf3247997fd756ec405e8f29b48094b2cf776cefacf89de39f7434
|
|
| MD5 |
a7c8b46975eb703fe8f78d289b2899c1
|
|
| BLAKE2b-256 |
856cbf6dd6e52aceda5f36b99f5a42c4386328ca530bd6627675120e3fd734e4
|
Provenance
The following attestation bundles were made for arxiv_research_mcp-1.0.0.tar.gz:
Publisher:
publish.yml on JananiV07/arxiv-mcp-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arxiv_research_mcp-1.0.0.tar.gz -
Subject digest:
9988e390baaf3247997fd756ec405e8f29b48094b2cf776cefacf89de39f7434 - Sigstore transparency entry: 1682420695
- Sigstore integration time:
-
Permalink:
JananiV07/arxiv-mcp-server@00144f7f63e9aecbb0db8593903ad93589b00b19 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/JananiV07
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@00144f7f63e9aecbb0db8593903ad93589b00b19 -
Trigger Event:
release
-
Statement type:
File details
Details for the file arxiv_research_mcp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: arxiv_research_mcp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08d5b2bf194a5c1e4265c39d83f238a5d0c18f8f9e8cbeb9eaa0b5bd433e7c61
|
|
| MD5 |
6a457ebc66817acc4598cf6511874ebd
|
|
| BLAKE2b-256 |
f0998271106ce986c6e50a5f280c941e218853ee8fc1105fd5e0e29f4b8d298d
|
Provenance
The following attestation bundles were made for arxiv_research_mcp-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on JananiV07/arxiv-mcp-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arxiv_research_mcp-1.0.0-py3-none-any.whl -
Subject digest:
08d5b2bf194a5c1e4265c39d83f238a5d0c18f8f9e8cbeb9eaa0b5bd433e7c61 - Sigstore transparency entry: 1682420746
- Sigstore integration time:
-
Permalink:
JananiV07/arxiv-mcp-server@00144f7f63e9aecbb0db8593903ad93589b00b19 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/JananiV07
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@00144f7f63e9aecbb0db8593903ad93589b00b19 -
Trigger Event:
release
-
Statement type: