A Python package for arXiv paper access with CLI and MCP server support
Project description
deepxiv-sdk
DeepXiv is an agent-first paper search and progressive reading tool.
Install it with pip, start using it immediately, and let the CLI auto-register a token on first use. No extra setup is required before your first query.
- 📚 API Documentation: https://data.rag.ac.cn/api/docs
- 🎥 Demo Video:
- 📄 Technical Report:
- 📖 中文文档: README.zh.md
What DeepXiv Does
DeepXiv is built around two core workflows that matter for agents:
- Search + Progressive Content Access
- Trending + Popularity signals
Instead of blindly loading full papers, DeepXiv lets agents read in layers, based on token budget and task value.
Quick Start
pip install deepxiv-sdk
On first use, deepxiv automatically registers a free token and saves it to ~/.env:
deepxiv search "agentic memory" --limit 5
If you want the full stack including MCP and the built-in research agent:
pip install "deepxiv-sdk[all]"
CLI-First Workflow
The CLI is the primary interface. DeepXiv is designed so agents can work like researchers: search first, judge quickly, then read only the most valuable parts.
deepxiv search "agentic memory" --limit 5
deepxiv paper 2603.21489 --brief
deepxiv paper 2603.21489 --head
deepxiv paper 2603.21489 --section Analysis
Three commands matter most for progressive reading:
--brief: decide whether a paper is worth deeper reading--head: inspect structure, sections, and token distribution--section: read only the most valuable parts such asIntroduction,Method, orExperiments
This is the core DeepXiv idea: agents should not load full papers unless they truly need them.
CLI Features
1. Paper Search and Reading
deepxiv search "transformer" --limit 10
deepxiv paper 2409.05591 --brief
deepxiv paper 2409.05591 --head
deepxiv paper 2409.05591 --section Introduction
deepxiv paper 2409.05591
2. Trending and Popularity
Research is not only about what exists, but what is worth reading now.
deepxiv trending --days 7 --limit 30
deepxiv paper 2409.05591 --popularity
trendingfinds the hottest recent papers from social signals--popularitygives paper-level propagation metrics such as views, tweets, likes, and replies
3. Web Search
deepxiv wsearch "karpathy"
deepxiv wsearch "karpathy" --json
Notes:
deepxiv wsearchcalls the DeepXiv web search endpoint- each
wsearchrequest costs 20 limit - a registered token gets 10,000 limit per day, so this is roughly 500 web searches per day
4. Semantic Scholar Metadata by ID
deepxiv sc 258001
deepxiv sc 258001 --json
deepxiv sc fetches metadata using a Semantic Scholar paper ID.
Notes:
- this is useful when your workflow already has Semantic Scholar IDs
- DeepXiv will soon provide a Semantic Scholar search service that returns Semantic Scholar IDs directly
5. Biomedical Papers
deepxiv pmc PMC544940 --head
deepxiv pmc PMC544940
Example Agent Workflows
Workflow 1: Review recent hot papers
deepxiv trending --days 7 --limit 30 --json
Then an agent can:
- run
--brieffor each paper - run
--headfor the most promising ones - read only key sections
- produce a report without manually opening dozens of papers
Workflow 2: Enter a new research topic
deepxiv search "agentic memory" --date-from 2026-03-01 --limit 100 --format json
Then an agent can:
- batch-brief the results
- prioritize papers with GitHub links
- inspect experiments via
--head - read
Experiments/Results - turn datasets, metrics, and scores into a baseline table
Built-in Deep Research Agent
If you do not want to compose the workflow manually, the CLI already includes a built-in research agent.
pip install "deepxiv-sdk[all]"
deepxiv agent config
deepxiv agent query "What are the latest papers about agent memory?" --verbose
If you already have your own agent stack, you can also just plug in the DeepXiv CLI skill and keep your own orchestration.
Agent Integration
DeepXiv is designed to work well inside Codex, Claude Code, OpenClaw, and similar agent runtimes.
MCP Server
Add to Claude Desktop MCP config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"deepxiv": {
"command": "deepxiv",
"args": ["serve"],
"env": {
"DEEPXIV_TOKEN": "your_token_here"
}
}
}
}
CLI Skill
mkdir -p $CODEX_HOME/skills
ln -s "$(pwd)/skills/deepxiv-cli" $CODEX_HOME/skills/deepxiv-cli
For frameworks without native skill support, load skills/deepxiv-cli/SKILL.md as operating instructions.
Python Usage
from deepxiv_sdk import Reader
reader = Reader()
results = reader.search("agent memory", size=5)
brief = reader.brief("2409.05591")
head = reader.head("2409.05591")
intro = reader.section("2409.05591", "Introduction")
web = reader.websearch("karpathy")
sc_meta = reader.semantic_scholar("258001")
Roadmap
DeepXiv is moving toward an academic paper data interface at 100M+ scale.
The roadmap is:
- Full arXiv coverage with T+1 automatic updates
- anyXiv coverage, including bioRxiv, medRxiv, and similar repositories
- Full open-access literature coverage
The metadata backbone will increasingly rely on Semantic Scholar metadata as the base layer, while continuously expanding coverage and enrichment quality.
Current Coverage
- ✅ arXiv - current primary source
- ✅ PubMed Central (PMC) - biomedical and life sciences
- 🔄 Semantic Scholar metadata integration - expanding as the metadata foundation
DeepXiv focuses on open-access literature so agents can work on unrestricted paper data instead of getting blocked by subscription barriers.
Complete API Reference
Search and Query
reader.search(query, size=10, search_mode="hybrid", categories=None, min_citation=None)
reader.websearch(query) # Web search (20 limit per request)
reader.semantic_scholar(sc_id) # Metadata lookup by Semantic Scholar ID
reader.head(arxiv_id) # Paper metadata and sections overview
reader.brief(arxiv_id) # Quick summary (title, TLDR, keywords, citations, GitHub URL)
reader.section(arxiv_id, section) # Read specific section
reader.raw(arxiv_id) # Full paper
reader.preview(arxiv_id) # Paper preview (~10k characters)
reader.json(arxiv_id) # Complete structured JSON
PMC (Biomedical Papers)
reader.pmc_head(pmc_id) # PMC paper metadata
reader.pmc_full(pmc_id) # Complete PMC paper JSON
Agent (Optional)
from deepxiv_sdk import Agent
agent = Agent(api_key="your_openai_key", model="gpt-4")
answer = agent.query("What are the latest papers about agent memory?")
print(answer)
Token Management
deepxiv supports 4 ways to configure tokens:
1. Auto-registration (Recommended) - Automatically creates and saves on first use
deepxiv search "agent"
2. Using config command
deepxiv config --token YOUR_TOKEN
3. Environment variable
export DEEPXIV_TOKEN="your_token"
4. Command-line option
deepxiv paper 2409.05591 --token YOUR_TOKEN
Increase daily limit: Default is 10,000 requests/day. For higher limits, email your name, email, and phone to tommy@chien.io.
Free Test Papers
These papers can be accessed without a token:
arXiv: 2409.05591, 2504.21776
PMC: PMC544940, PMC514704
MCP Tools
Available tools when using MCP Server:
| Tool | Description |
|---|---|
search_papers |
Search arXiv papers |
get_paper_brief |
Quick summary |
get_paper_metadata |
Full metadata |
get_paper_section |
Read specific section |
get_full_paper |
Complete paper |
get_paper_preview |
Paper preview |
get_pmc_metadata |
PMC paper metadata |
get_pmc_full |
Complete PMC paper |
Agent Usage (Optional)
The built-in ReAct agent can automatically search papers, read content, and perform multi-turn reasoning:
from deepxiv_sdk import Agent
agent = Agent(
api_key="your_deepseek_key",
base_url="https://api.deepseek.com/v1",
model="deepseek-chat"
)
answer = agent.query("Compare key ideas in transformers and attention mechanisms")
print(answer)
Or via CLI:
deepxiv agent config # Configure LLM API
deepxiv agent query "What are the latest papers about agent memory?" --verbose
Error Handling
deepxiv provides specific exception types:
from deepxiv_sdk import (
Reader,
AuthenticationError, # 401 - Invalid or expired token
RateLimitError, # 429 - Daily limit reached
NotFoundError, # 404 - Paper not found
ServerError, # 5xx - Server error
APIError # Other API errors
)
try:
paper = reader.brief("2409.05591")
except AuthenticationError:
print("Please update your token")
except RateLimitError:
print("Daily limit reached")
except NotFoundError:
print("Paper not found")
except APIError as e:
print(f"API error: {e}")
Troubleshooting
Q: Do I need a token to use? A: No. Some papers are free to access. Search and some content require a token, but it's auto-created on first use.
Q: What's the maximum search results?
A: 100 per request. Use offset parameter for pagination.
Q: How to handle timeouts? A: Reader automatically retries (max 3 times) with exponential backoff. You can customize:
reader = Reader(timeout=120, max_retries=5)
Q: Can I cache paper content? A: Yes. After getting content with reader, cache locally to database or file system.
Q: Which LLMs does the agent support? A: Any OpenAI-compatible API (OpenAI, DeepSeek, OpenRouter, local Ollama, etc.).
Examples
See examples/ directory:
quickstart.py- 5-minute quick startexample_reader.py- Basic Reader usageexample_agent.py- Agent usageexample_advanced.py- Advanced patternsexample_error_handling.py- Error handling examples
License
MIT License - see LICENSE file
Support
- 🐛 GitHub Issues: https://github.com/qhjqhj00/deepxiv_sdk/issues
- 📚 API Documentation: https://data.rag.ac.cn/api/docs
- 📧 Higher Limits: Email with your name, email, and phone to
tommy@chien.io
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepxiv_sdk-0.2.4.tar.gz.
File metadata
- Download URL: deepxiv_sdk-0.2.4.tar.gz
- Upload date:
- Size: 50.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
572d2504da3e781aab06bb3c3b517d4fed497edbcf9f53ad4636c8eb291f2f1f
|
|
| MD5 |
d0b9764f53ca29702a7ed6889a9cdaed
|
|
| BLAKE2b-256 |
dc1fccddabae1f81e561cc7680b66e8778f559cb8fe2dd2d1138d4d7d0fd9cfe
|
File details
Details for the file deepxiv_sdk-0.2.4-py3-none-any.whl.
File metadata
- Download URL: deepxiv_sdk-0.2.4-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e8654dafc11da97722abbd8193917b318fead3be0227e90e91aa52b448d7705
|
|
| MD5 |
76185438dcbef1f10a54df10c73443c6
|
|
| BLAKE2b-256 |
31f84b4cb9df479d53d3e3a6bb2134506f29320628590f3243d393dadfbddda6
|