A Python package for arXiv paper access with CLI and MCP server support
Project description
deepxiv-sdk
Agent-first academic paper interface for CLI, MCP, and Python. deepxiv gives OpenClaw, Claude Code, Codex, and other coding agents a fast, structured way to search papers, inspect metadata, read only the right sections, and reason over open-access literature without wasting tokens.
- 📚 API Documentation: https://data.rag.ac.cn/api/docs
- 🎥 Demo Video:
- 📄 Technical Report:
- 📖 中文文档: README.zh.md
Why deepxiv for agents?
| Feature | deepxiv | Standard arXiv API |
|---|---|---|
| Hybrid Search (BM25 + Vector) | ✅ | ❌ |
| AI-Generated Summaries (TLDR) | ✅ | ❌ |
| Section-by-Section Access | ✅ | ❌ |
| GitHub Link Extraction | ✅ | ❌ |
| MCP Protocol Support | ✅ | ❌ |
| Biomedical Papers (PMC) | ✅ | ❌ |
| Agent-Oriented CLI | ✅ | ❌ |
| Free Daily Requests | 10,000 | ∞* |
*arXiv API has no limit, but strict rate limiting
Core Features
- 🔍 Hybrid Search: BM25 + vector search for better retrieval quality
- 📄 Section-Based Access: load only the sections an agent actually needs
- ✨ Brief Views: title, TLDR, keywords, citations, PDF, and GitHub link when available
- 💻 Three Interfaces: CLI / MCP Server / Python SDK
- 🤖 Agent-Friendly by Default: works well inside OpenClaw, Claude Code, Codex, and similar agent loops
- 📚 PMC Support: access biomedical literature alongside arXiv
- 🔥 Trending + Social Impact: discover papers getting attention online
Agent Integration
deepxiv is designed to be the paper interface layer for coding and research agents.
- Codex: install the CLI skill and let Codex call
deepxiv search,deepxiv paper, anddeepxiv pmcdirectly - Claude Code: load the same CLI skill or use the MCP server for tool-based access
- OpenClaw: use the CLI as a stable shell interface, or wire the MCP server into your agent runtime
- Other agents: use the CLI for predictable terminal workflows, the MCP server for tool calling, or the Python SDK for direct integration
The key design goal is simple: give agents a comprehensive and token-efficient academic paper interface instead of forcing them to scrape raw PDFs or overfetch entire papers.
🌐 Open Access Literature Support
Current Support
- ✅ arXiv - Computer Science, Physics, Math, and more
- ✅ PubMed Central (PMC) - Biomedical and life sciences
Coming Soon (Roadmap)
- 🔄 bioRxiv - Preprints in biology
- 🔄 medRxiv - Preprints in medicine
- 🔄 Other OA Sources - Additional open access repositories
- 🔄 Full OA Literature Coverage - Comprehensive open access ecosystem
Why OA Literature? By focusing on open access papers, deepxiv ensures that researchers and AI systems have unrestricted access to knowledge without subscription barriers.
Quick Start
1. Installation
# Basic install (Reader + CLI)
pip install deepxiv-sdk
# Full install (MCP + Agent)
pip install deepxiv-sdk[all]
2. First Use
On first use, deepxiv automatically registers a free token and saves it to ~/.env:
deepxiv search "agent memory" --limit 5
3. CLI Usage
The CLI is the fastest way to plug deepxiv into agent workflows.
# Search papers
deepxiv search "transformer" --limit 10
# Quick paper understanding
deepxiv paper 2409.05591 --brief
# Paper structure and targeted reading
deepxiv paper 2409.05591 --head
deepxiv paper 2409.05591 --section Introduction
deepxiv paper 2409.05591 --preview
deepxiv paper 2409.05591
# Social/trending signals
deepxiv paper 2409.05591 --popularity
deepxiv trending --days 14 --limit 10
# Biomedical papers
deepxiv pmc PMC544940 --head
4. Use with OpenClaw, Claude Code, and Codex
Codex skill
mkdir -p $CODEX_HOME/skills
ln -s "$(pwd)/skills/deepxiv-cli" $CODEX_HOME/skills/deepxiv-cli
The included skill teaches agents when to use:
deepxiv searchfor literature discoverydeepxiv paper --brieffor quick filteringdeepxiv paper --sectionfor focused readingdeepxiv pmcfor biomedical papersdeepxiv agentfor deeper multi-turn reasoning
Claude Code / OpenClaw / custom agents
If your framework supports reusable operating instructions, load skills/deepxiv-cli/SKILL.md directly. This gives agents a clean command selection guide instead of relying on ad hoc shell usage.
5. MCP Server
Use MCP when you want tool-based integration rather than shell execution.
Add to Claude Desktop MCP config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"deepxiv": {
"command": "deepxiv",
"args": ["serve"],
"env": {
"DEEPXIV_TOKEN": "your_token_here"
}
}
}
}
Available MCP tools:
| Tool | Description |
|---|---|
search_papers |
Search arXiv papers |
get_paper_brief |
Quick summary |
get_paper_metadata |
Full metadata |
get_paper_section |
Read specific section |
get_full_paper |
Complete paper |
get_paper_preview |
Paper preview |
get_pmc_metadata |
PMC paper metadata |
get_pmc_full |
Complete PMC paper |
6. Python Usage
from deepxiv_sdk import Reader
reader = Reader()
# Search papers
results = reader.search("agent memory", size=5)
for paper in results.get("results", []):
print(f"{paper['title']} ({paper['arxiv_id']})")
# Get paper info
brief = reader.brief("2409.05591")
print(f"Title: {brief['title']}")
print(f"TLDR: {brief.get('tldr', 'N/A')}")
print(f"GitHub: {brief.get('github_url', 'N/A')}")
# Read specific section
intro = reader.section("2409.05591", "Introduction")
print(intro[:500])
# Get trending papers (no token required)
trending = reader.trending(days=7, limit=5)
for paper in trending['papers']:
print(f"#{paper['rank']}: {paper['arxiv_id']}")
print(f" Views: {paper['stats']['total_views']}")
# Get social impact metrics (requires token)
reader_with_token = Reader(token="your_token_here")
impact = reader_with_token.social_impact("2409.05591")
if impact:
print(f"Views: {impact['total_views']}")
print(f"Tweets: {impact['total_tweets']}")
Complete API Reference
Search and Query
reader.search(query, size=10, search_mode="hybrid", categories=None, min_citation=None)
reader.head(arxiv_id) # Paper metadata and sections overview
reader.brief(arxiv_id) # Quick summary (title, TLDR, keywords, citations, GitHub URL)
reader.section(arxiv_id, section) # Read specific section
reader.raw(arxiv_id) # Full paper
reader.preview(arxiv_id) # Paper preview (~10k characters)
reader.json(arxiv_id) # Complete structured JSON
PMC (Biomedical Papers)
reader.pmc_head(pmc_id) # PMC paper metadata
reader.pmc_full(pmc_id) # Complete PMC paper JSON
Agent (Optional)
from deepxiv_sdk import Agent
agent = Agent(api_key="your_openai_key", model="gpt-4")
answer = agent.query("What are the latest papers about agent memory?")
print(answer)
Token Management
deepxiv supports 4 ways to configure tokens:
1. Auto-registration (Recommended) - Automatically creates and saves on first use
deepxiv search "agent"
2. Using config command
deepxiv config --token YOUR_TOKEN
3. Environment variable
export DEEPXIV_TOKEN="your_token"
4. Command-line option
deepxiv paper 2409.05591 --token YOUR_TOKEN
Increase daily limit: Default is 10,000 requests/day. For higher limits, email your name, email, and phone to tommy@chien.io.
Free Test Papers
These papers can be accessed without a token:
arXiv: 2409.05591, 2504.21776
PMC: PMC544940, PMC514704
Agent Usage (Optional)
The built-in ReAct agent can automatically search papers, read content, and perform multi-turn reasoning:
from deepxiv_sdk import Agent
agent = Agent(
api_key="your_deepseek_key",
base_url="https://api.deepseek.com/v1",
model="deepseek-chat"
)
answer = agent.query("Compare key ideas in transformers and attention mechanisms")
print(answer)
Or via CLI:
deepxiv agent config # Configure LLM API
deepxiv agent query "What are the latest papers about agent memory?" --verbose
Error Handling
deepxiv provides specific exception types:
from deepxiv_sdk import (
Reader,
AuthenticationError, # 401 - Invalid or expired token
RateLimitError, # 429 - Daily limit reached
NotFoundError, # 404 - Paper not found
ServerError, # 5xx - Server error
APIError # Other API errors
)
try:
paper = reader.brief("2409.05591")
except AuthenticationError:
print("Please update your token")
except RateLimitError:
print("Daily limit reached")
except NotFoundError:
print("Paper not found")
except APIError as e:
print(f"API error: {e}")
Troubleshooting
Q: Do I need a token to use? A: No. Some papers are free to access. Search and some content require a token, but it's auto-created on first use.
Q: What's the maximum search results?
A: 100 per request. Use offset parameter for pagination.
Q: How to handle timeouts? A: Reader automatically retries (max 3 times) with exponential backoff. You can customize:
reader = Reader(timeout=120, max_retries=5)
Q: Can I cache paper content? A: Yes. After getting content with reader, cache locally to database or file system.
Q: Which LLMs does the agent support? A: Any OpenAI-compatible API (OpenAI, DeepSeek, OpenRouter, local Ollama, etc.).
Examples
See examples/ directory:
quickstart.py- 5-minute quick startexample_reader.py- Basic Reader usageexample_agent.py- Agent usageexample_advanced.py- Advanced patternsexample_error_handling.py- Error handling examples
License
MIT License - see LICENSE file
Support
- 🐛 GitHub Issues: https://github.com/qhjqhj00/deepxiv_sdk/issues
- 📚 API Documentation: https://data.rag.ac.cn/api/docs
- 📧 Higher Limits: Email with your name, email, and phone to
tommy@chien.io
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepxiv_sdk-0.2.3.tar.gz.
File metadata
- Download URL: deepxiv_sdk-0.2.3.tar.gz
- Upload date:
- Size: 47.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e926668f2dab9c65366eea0ca9cb61939626cb23baf328382f2955defb52ad26
|
|
| MD5 |
d0ac34608064f51fa59c260770d94782
|
|
| BLAKE2b-256 |
82710a4fa9847cbacf5f879a32d512be1d8fcbaa19ef72d1a52b567215553cf2
|
File details
Details for the file deepxiv_sdk-0.2.3-py3-none-any.whl.
File metadata
- Download URL: deepxiv_sdk-0.2.3-py3-none-any.whl
- Upload date:
- Size: 42.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
924b38fc3942414d1863acf9008b5fec82889b7e291b74d8c571e4d13c87b24d
|
|
| MD5 |
1b1b3dcc5959c7e405f44ef45cd4f994
|
|
| BLAKE2b-256 |
a75c287b065413eab619a90ba031616ffbc825a7e396733658107bed3e057b8f
|