Skip to main content

An MCP server exposing arXiv research tools (search, abstracts, author lookup, trending) to LLM agents.

Project description

๐Ÿ“š arXiv Research MCP Server

Give any LLM agent a research librarian for arXiv.

Search 2.4M+ papers, pull full abstracts, track a researcher's latest work, and surface what a field is publishing right now โ€” all over the Model Context Protocol.

CI Python MCP arXiv API Type checked: pyright strict Lint: ruff License: MIT


๐ŸŽฌ Demo

โ–ถ๏ธ Demo GIF coming soon โ€” a 30-second walkthrough of an agent searching arXiv and reading an abstract through these tools.


โœจ Why this server

Large language models are great at reasoning about papers but have no live access to the literature. This server closes that gap with four focused, read-only tools that an agent can call to discover, read, and monitor research on arXiv โ€” with output shaped specifically for an LLM's context window.

  • ๐Ÿง  Agent-first tool design โ€” every tool carries a detailed docstring the host shows to the model, so it knows when and how to call each one.
  • ๐Ÿ“ฆ Structured, validated output โ€” each tool returns a typed Pydantic model, surfaced as MCP structuredContent (not just a blob of text).
  • ๐ŸŽš๏ธ Context-aware verbosity โ€” concise mode (default) trims abstracts and caps author lists; detailed returns everything. You never blow the window by accident.
  • ๐Ÿ›Ÿ Honest by design โ€” trending_topics refuses to fake popularity metrics arXiv doesn't expose, and says so in every response.
  • โœ… Actually verified โ€” ruff + pyright --strict + an in-process smoke test and a real end-to-end stdio MCP client test, all green against the live API.

๐Ÿงฐ The four tools

Tool What it's for Parameters (defaults)
๐Ÿ” search_papers Keyword discovery across all of arXiv. Supports field prefixes (ti:, au:, abs:, cat:) and boolean AND / OR / ANDNOT. query, max_results=10, sort_by="relevance", response_format="concise"
๐Ÿ“„ get_abstract Full record for one paper by ID โ€” untruncated abstract, every author, all categories, DOI / journal ref / comment, PDF + abstract URLs. arxiv_id
๐Ÿ‘ค find_by_author A researcher's most recent papers, newest first. author_name, max_results=10, response_format="concise"
๐Ÿ“ˆ trending_topics Recent submissions in a category within a time window, plus the sub-topics that dominate them. category, days=7, max_results=10, response_format="concise"

Shared conventions

  • response_format: "concise" (default) shortens the abstract to ~280 chars and caps the author list to 8 names โ€” abstract_truncated and author_count always tell the agent what was elided. "detailed" returns full text and all authors.
  • sort_by (search only): "relevance", "newest", or "last_updated".
  • Safety caps (auto-applied, and reported back in a note field): max_results is clamped to 50, trending_topics scans at most 200 recent papers and honors a window of 1โ€“90 days.
  • arxiv_id is forgiving โ€” it accepts bare (2401.01234), versioned (2401.01234v2), legacy (math.GT/0309136), and full-URL forms.

A deliberate note on "trending"

The arXiv API exposes no citation, download, or view counts โ€” so genuine popularity cannot be measured. trending_topics therefore defines "trending" as recency of submission within the window, and ranks the sub-categories those recent papers co-occur in. Every response restates this in its note field so the agent never overclaims. Honesty over vanity metrics.


๐Ÿš€ Quick start

Install from PyPI:

pip install arxiv-research-mcp

โ€ฆthen point your MCP client at the arxiv-research-mcp command (see Connect it to an MCP host).

Or install from source
git clone https://github.com/JananiV07/arxiv-mcp-server.git
cd arxiv-mcp-server

python -m venv .venv
# Windows (PowerShell):
.venv\Scripts\Activate.ps1
# macOS / Linux:
source .venv/bin/activate

pip install -r requirements.txt
python src/server.py

Requires Python 3.10+. Runtime deps are just mcp[cli] and arxiv. The PyPI package is named arxiv-research-mcp (the name arxiv-mcp-server was already taken by an unrelated project).

Run it directly (it speaks MCP over stdio, so normally a host launches it):

python src/server.py

๐Ÿ”Œ Connect it to an MCP host

Configure your client

Add an entry to your client's MCP config file (for example, Claude Desktop uses claude_desktop_config.json; other clients expose an equivalent).

If you installed from PyPI (pip install arxiv-research-mcp), just reference the installed command:

{
  "mcpServers": {
    "arxiv-research": {
      "command": "arxiv-research-mcp"
    }
  }
}

If you installed from source, point at the Python interpreter from your virtual environment:

{
  "mcpServers": {
    "arxiv-research": {
      "command": "/absolute/path/to/arxiv-mcp-server/.venv/bin/python",
      "args": ["/absolute/path/to/arxiv-mcp-server/src/server.py"]
    }
  }
}

On Windows (from source), use the .exe and forward slashes โ€” e.g. C:/path/to/arxiv-mcp-server/.venv/Scripts/python.exe.

Restart the host, and the four tools appear under the arxiv-research server.

Try it with the MCP Inspector

npx @modelcontextprotocol/inspector python src/server.py

๐Ÿ’ฌ What an agent can do with it

Once connected, natural-language requests map cleanly onto the tools:

You askโ€ฆ The agent callsโ€ฆ
"Find recent papers on diffusion models for video." search_papers("ti:diffusion AND cat:cs.CV", sort_by="newest")
"Summarize 'Attention Is All You Need'." get_abstract("1706.03762")
"What has Yoshua Bengio published lately?" find_by_author("Yoshua Bengio")
"What's hot in machine learning this week?" trending_topics("cs.LG", days=7)

Example output (get_abstract, abridged)

{
  "arxiv_id": "1706.03762v7",
  "title": "Attention Is All You Need",
  "authors": ["Ashish Vaswani", "Noam Shazeer", "..."],
  "author_count": 8,
  "published": "2017-06-12",
  "updated": "2023-08-02",
  "primary_category": "cs.CL",
  "categories": ["cs.CL", "cs.LG"],
  "abstract": "The dominant sequence transduction models ...",
  "abstract_truncated": false,
  "abstract_url": "http://arxiv.org/abs/1706.03762v7",
  "pdf_url": "https://arxiv.org/pdf/1706.03762v7"
}

๐Ÿ—๏ธ Architecture & design choices

arxiv-mcp-server/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ server.py          # FastMCP server: 4 tools + Pydantic models + helpers
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ smoke_test.py      # in-process tests (import the tool fns directly)
โ”‚   โ””โ”€โ”€ client_test.py     # end-to-end test over the real stdio MCP protocol
โ”œโ”€โ”€ pyproject.toml         # packaging + ruff + pyright config
โ”œโ”€โ”€ requirements.txt       # runtime deps
โ””โ”€โ”€ README.md
  • FastMCP registers each tool via @mcp.tool(); type hints + pydantic.Field descriptions become the JSON input schema the host advertises to the model.
  • Typed output models โ€” Paper, SearchResults, AuthorResults, TopicCount, TrendingResults โ€” give the host structured, machine-readable results.
  • Read-only annotations โ€” all four tools set readOnlyHint=True / destructiveHint=False, so hosts can treat them as safe to call freely.
  • One shared arxiv.Client with a polite delay + retries, respecting arXiv's fair-use guidance; its chatty INFO logging is silenced so stdout stays a clean MCP channel.
  • Actionable errors โ€” bad input or a failed request raises a ValueError whose message tells the agent how to fix the call (correct ID format, valid category code, query-prefix syntax, โ€ฆ).

๐Ÿงช Development & testing

pip install -e ".[dev]"          # ruff + pyright

ruff check .                     # lint
pyright                          # type check (strict on our own code)
python scripts/smoke_test.py     # in-process checks vs the live arXiv API
python scripts/client_test.py    # full stdio MCP protocol round-trip

Two complementary test layers:

  • smoke_test.py imports the tool functions directly โ€” fast feedback on tool logic, the concise/detailed split, max_results/days clamping, missing-field handling, and error paths.
  • client_test.py is a true MCP client: it spawns src/server.py as a subprocess and exercises initialize โ†’ list_tools โ†’ call_tool over stdio โ€” the same path any MCP host uses. This is what proves the server works as an MCP server: input schemas, structuredContent, tool annotations, and protocol-level error reporting (isError).

๐Ÿ“‹ Requirements

  • Python 3.10+
  • mcp[cli] โ€” the MCP Python SDK (FastMCP)
  • arxiv โ€” Python wrapper for the arXiv API
  • Network access to export.arxiv.org

๐Ÿ™ Acknowledgements

arXiv is a trademark of Cornell University. This project is an independent, unofficial integration and is not affiliated with or endorsed by arXiv.


๐Ÿ“„ License

Released under the MIT License โ€” see LICENSE.

Built for the agentic era โ€” so your LLM can read the literature, not just guess about it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_research_mcp-1.0.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv_research_mcp-1.0.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_research_mcp-1.0.0.tar.gz.

File metadata

  • Download URL: arxiv_research_mcp-1.0.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arxiv_research_mcp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9988e390baaf3247997fd756ec405e8f29b48094b2cf776cefacf89de39f7434
MD5 a7c8b46975eb703fe8f78d289b2899c1
BLAKE2b-256 856cbf6dd6e52aceda5f36b99f5a42c4386328ca530bd6627675120e3fd734e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for arxiv_research_mcp-1.0.0.tar.gz:

Publisher: publish.yml on JananiV07/arxiv-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arxiv_research_mcp-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for arxiv_research_mcp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08d5b2bf194a5c1e4265c39d83f238a5d0c18f8f9e8cbeb9eaa0b5bd433e7c61
MD5 6a457ebc66817acc4598cf6511874ebd
BLAKE2b-256 f0998271106ce986c6e50a5f280c941e218853ee8fc1105fd5e0e29f4b8d298d

See more details on using hashes here.

Provenance

The following attestation bundles were made for arxiv_research_mcp-1.0.0-py3-none-any.whl:

Publisher: publish.yml on JananiV07/arxiv-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page