MCP service for arXiv paper search using arxiv_query_fluent

These details have not been verified by PyPI

Project links

Project description

ArXiv Query MCP Server

The ArXiv Query MCP Server is a Model Context Protocol (MCP) implementation that provides AI assistants with capabilities to search, download, and extract text from academic papers on arXiv.

Features

Comprehensive Search Options: Search arXiv papers by ID, author, category, title, abstract, or date range
Paper Downloads: Download papers as PDF files with automatic caching
Text Extraction: Convert downloaded PDFs to text with support for Mistral OCR API or local processing
Rate Limiting: Smart rate limiting to respect arXiv API usage policies

Installation

Prerequisites

Docker
Python 3.9+
Pip package manager

Docker Installation (Recommended)

# Clone the repository
git clone https://github.com/yourusername/mcp-arxiv-query.git
cd mcp-arxiv-query

# Build the Docker image
docker build -t mcp-arxiv-query .

# Test the server
echo '{"jsonrpc":"2.0","id":1,"method":"list_tools","params":{}}' | \
  docker run --rm -i mcp-arxiv-query

Local Installation

# Clone the repository
git clone https://github.com/yourusername/mcp-arxiv-query.git
cd mcp-arxiv-query

# Install dependencies with uv (recommended)
uv pip install .

# Or install with pip
pip install .

# Run the server
python -m mcp_arxiv_query

Usage with Claude Desktop

Add the ArXiv Query MCP server to your Claude Desktop configuration file.

Basic Configuration

{
  "mcp_servers": {
    "arxiv-query": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "-v",
        "$HOME/Downloads:/app/Downloads",
        "mcp-arxiv-query"
      ]
    }
  }
}

Advanced Configuration with OCR Support

{
  "mcp_servers": {
    "arxiv-query": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "-e", "MISTRAL_OCR_API_KEY=your_api_key_here",
        "-e", "ARXIV_MAX_CALLS_PER_MINUTE=30",
        "-e", "ARXIV_MAX_CALLS_PER_DAY=2000",
        "-e", "LOG_LEVEL=INFO",
        "-v",
        "$HOME/Downloads:/app/Downloads",
        "mcp-arxiv-query"
      ]
    }
  }
}

Environment Variables

Variable	Description	Default
`DOWNLOAD_DIR`	Directory for PDF downloads	/app/Downloads
`MISTRAL_OCR_API_KEY`	API key for Mistral OCR (optional)	None
`ARXIV_MAX_CALLS_PER_MINUTE`	Maximum arXiv API calls per minute	30
`ARXIV_MAX_CALLS_PER_DAY`	Maximum arXiv API calls per day	2000
`ARXIV_MIN_INTERVAL_SECONDS`	Minimum time between API calls in seconds	1.0
`LOG_LEVEL`	Logging level (DEBUG, INFO, WARNING, ERROR)	INFO
`LOG_FORMAT`	Set to "json" for JSON-formatted logs	standard

Claude Integration

This MCP server is designed to be used with Claude. For a seamless experience, we recommend adding the following instructions to your Claude preferences:

When I type "@aq <query>", please use the arxiv-query tools to search for academic papers related to my query.
For example, "@aq Capturing Semantic Flow of ML-based Systems" means search for relevant papers on this topic.

When I type "@ax <arxiv-id>", please: 
1. Download the paper using the download_paper tool
2. Extract its text content using the pdf_to_text tool
3. Be ready to answer my questions about the paper

Examples: "@ax 2503.13415" or "@ax 2503.13415v2"

Usage Examples

Here are some examples of how to interact with Claude using the ArXiv Query tools:

Search for Papers

User: @aq transformer architecture in NLP

Claude: Searching for papers about "transformer architecture in NLP"...
[Claude would use search_arxiv tool and show results]

I found several relevant papers on transformer architecture in NLP:
1. "Attention Is All You Need" by Vaswani et al.
2. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin et al.
3. ...

Category-Specific Search

User: @aq category:cs.CL language models

Claude: Searching for language models papers in the Computation and Language category...
[Claude would use search_by_category tool]

Here are the most recent papers about language models in the cs.CL category:
1. ...
2. ...

Author Search

User: @aq author:Hinton recent work

Claude: Searching for recent papers by Geoffrey Hinton...
[Claude would use search_by_author tool]

Here are Geoffrey Hinton's recent publications:
1. ...
2. ...

Download and Analyze a Paper

User: @ax 2203.02155

Claude: I'll download and process the paper with ID 2203.02155 for you.
[Claude would download the PDF and extract text]

I've downloaded and processed the paper "Training language models to follow instructions with human feedback" by Ouyang et al.

This paper introduces InstructGPT, which is trained to follow human instructions. The key findings are:
1. ...
2. ...

Would you like me to analyze any specific section in more detail?

Date Range Search

User: @aq diffusion models date:2023-01-01 to:2023-06-30

Claude: Searching for papers about diffusion models published between January and June 2023...
[Claude would use search_by_date_range tool]

I found 15 papers about diffusion models published in the first half of 2023:
1. ...
2. ...

Multiple Questions About a Paper

User: @ax 2307.09288

Claude: I'll download and process the paper with ID 2307.09288 for you.
[Claude would download and process the paper]

I've downloaded and processed the paper "GPT-4 Technical Report" by OpenAI.

User: What methodology did they use for evaluation?

Claude: In the GPT-4 Technical Report, the evaluation methodology includes:
1. Benchmark testing across multiple domains including...
2. ...

These examples demonstrate how to leverage the ArXiv Query tools for academic research through Claude's interface using the recommended shortcuts.

Available Tools

The server provides the following tools:

search_arxiv - Flexible search interface with multiple parameters
download_paper - Download papers as PDF files
search_by_category - Search papers by arXiv category
search_by_author - Search papers by author name
search_by_id - Search for a specific paper by ID
search_by_date_range - Search papers within a date range
pdf_to_text - Convert PDF files to text
get_rate_limiter_stats - View API usage statistics

Mistral OCR API Support

The service supports using the Mistral OCR API for PDF text extraction, which provides superior accuracy compared to standard PDF extractors, especially for complex academic papers.

To enable:

Obtain an API key from Mistral AI (https://console.mistral.ai/)
Set the API key as the environment variable MISTRAL_OCR_API_KEY

Key features:

Intelligent Processing: The system automatically extracts arXiv IDs from filenames and prioritizes using arXiv PDF URLs for processing, eliminating local file transfers
Fallback Options: If an arXiv ID cannot be identified, the system processes the local PDF file
Automatic Degradation: If Mistral OCR API fails, the system automatically falls back to PyPDF2

Notes:

When using URL mode, the system relies on arXiv's public PDF URL format
When using local file mode, PDF size must be less than 20MB
The program uses the official mistral-ocr-latest model
Without setting MISTRAL_OCR_API_KEY, the system automatically uses PyPDF2 for local processing

Troubleshooting

PDF Download Issues

If you encounter problems downloading PDFs:

Ensure your download directory has appropriate permissions
Verify Docker volume mounting is correct
Run the build_and_test.sh script to test download functionality
Check logs for detailed error messages

Common issues:

File Not Found: Ensure the arXiv ID format is correct, e.g., "2303.08774"
Cannot Write File: Check download directory permissions, ensure the container user has write access
Docker Mount Issues: Ensure the -v parameter is correct, format should be -v host_path:/app/Downloads
Mistral API Errors: Check if the API key is correct and if the PDF file exceeds the size limit (20MB)
arXiv ID Extraction Problems: Ensure the PDF file is named with a standard arXiv ID, e.g., "2303.08774.pdf"

Manual Download Testing

You can manually test the PDF download functionality with the following command:

docker run --rm -i \
  -v "$HOME/Downloads:/app/Downloads" \
  mcp-arxiv-query python -c "from mcp_arxiv_query.downloader import ArxivDownloader; downloader = ArxivDownloader('/app/Downloads'); result = downloader.download_paper('2303.08774'); print(result)"

Development

This service is built on the MCP protocol and the arxiv_query_fluent Python library.

Directory Structure

src/mcp_arxiv_query/: Source code
- __init__.py: Package initialization
- __main__.py: Entry point
- server.py: MCP server implementation
- pdf_utils.py: PDF text extraction tools
- arxiv_service.py: arXiv API service wrapper
- rate_limiter.py: API rate limiting
- tools.py: Tool definitions
- logger.py: Logging configuration

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_arxiv_query-0.1.0.tar.gz (6.0 kB view details)

Uploaded Mar 23, 2025 Source

File details

Details for the file mcp_arxiv_query-0.1.0.tar.gz.

File metadata

Download URL: mcp_arxiv_query-0.1.0.tar.gz
Upload date: Mar 23, 2025
Size: 6.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for mcp_arxiv_query-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2c58f9e478729c1623e297cbc30f0df1960ecbc40dc313b8e710a3c9bfbd25ec`
MD5	`f2c97ece99366d30dcc60e608c61bb96`
BLAKE2b-256	`6469741b6138803d4134aa907592b08a4dd1a013ec0c98f82a1211816dd6b503`

See more details on using hashes here.

mcp-arxiv-query 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ArXiv Query MCP Server

Features

Installation

Prerequisites

Docker Installation (Recommended)

Local Installation

Usage with Claude Desktop

Basic Configuration

Advanced Configuration with OCR Support

Environment Variables

Claude Integration

Usage Examples

Search for Papers

Category-Specific Search

Author Search

Download and Analyze a Paper

Date Range Search

Multiple Questions About a Paper

Available Tools

Mistral OCR API Support

Troubleshooting

PDF Download Issues

Manual Download Testing

Development

Directory Structure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes