Skip to main content

A Python package for arXiv paper access with CLI and MCP server support

Project description

deepxiv-sdk

High-quality academic paper data interface designed for LLM applications. Provides hybrid search, intelligent summaries, section-by-section access, and built-in reasoning agents.

Why Choose deepxiv?

Feature deepxiv Standard arXiv API
Hybrid Search (BM25 + Vector)
AI-Generated Summaries (TLDR)
Section-by-Section Access
MCP Protocol Support
Built-in Reasoning Agent
Biomedical Papers (PMC)
Free Daily Requests 10,000 ∞*

*arXiv API has no limit, but strict rate limiting

Core Features

  • 🔍 Hybrid Search: BM25 + Vector search for better quality results
  • 📄 Section-Based Access: Load only what you need, save tokens
  • 📚 PMC Support: Full access to biomedical literature
  • 💻 Three-Layer Interface: CLI / Python SDK / MCP Server
  • 🤖 Built-in Agent: ReAct framework with multi-turn reasoning
  • 🔌 Flexible LLM Support: Compatible with OpenAI, DeepSeek, OpenRouter, etc.
  • Smart Summaries: AI-generated paper abstracts and keywords

🌐 Open Access Literature Support

Current Support

  • arXiv - Computer Science, Physics, Math, and more
  • PubMed Central (PMC) - Biomedical and life sciences

Coming Soon (Roadmap)

  • 🔄 bioRxiv - Preprints in biology
  • 🔄 medRxiv - Preprints in medicine
  • 🔄 Other OA Sources - Additional open access repositories
  • 🔄 Full OA Literature Coverage - Comprehensive open access ecosystem

Why OA Literature? By focusing on open access papers, deepxiv ensures that researchers and AI systems have unrestricted access to knowledge without subscription barriers.

Quick Start

1. Installation

# Basic install (Reader + CLI)
pip install deepxiv-sdk

# Full install (MCP + Agent)
pip install deepxiv-sdk[all]

2. First Use

On first use, deepxiv automatically registers a free token and saves it to ~/.env:

deepxiv search "agent memory" --limit 5

3. Python Usage

from deepxiv_sdk import Reader

reader = Reader()

# Search papers
results = reader.search("agent memory", size=5)
for paper in results.get("results", []):
    print(f"{paper['title']} ({paper['arxiv_id']})")

# Get paper info
brief = reader.brief("2409.05591")
print(f"Title: {brief['title']}")
print(f"TLDR: {brief.get('tldr', 'N/A')}")

# Read specific section
intro = reader.section("2409.05591", "Introduction")
print(intro[:500])

4. CLI Usage

# Search papers
deepxiv search "transformer" --limit 10

# Get paper info
deepxiv paper 2409.05591 --brief          # Quick overview
deepxiv paper 2409.05591 --head           # Metadata
deepxiv paper 2409.05591 --section intro  # Specific section
deepxiv paper 2409.05591                  # Full paper

# Get PMC papers
deepxiv pmc PMC544940 --head

# Show current token
deepxiv token

5. Use in Claude Desktop (MCP Server)

Add to Claude Desktop MCP config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Windows: %APPDATA%\Claude\claude_desktop_config.json

Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "deepxiv": {
      "command": "deepxiv",
      "args": ["serve"],
      "env": {
        "DEEPXIV_TOKEN": "your_token_here"
      }
    }
  }
}

6. Agent Skill (Optional)

deepxiv also provides a reusable Agent Skill for LLM frameworks:

# View the skill definition
cat skills/deepxiv-cli/SKILL.md

# Use with Codex or other agentic LLM frameworks
# Copy or symlink to your skills directory:
mkdir -p $CODEX_HOME/skills
ln -s "$(pwd)/skills/deepxiv-cli" $CODEX_HOME/skills/deepxiv-cli

The skill teaches agents when to use:

  • deepxiv search - Find papers
  • deepxiv paper - Read papers
  • deepxiv pmc - Access biomedical literature
  • deepxiv agent - Use the reasoning agent
  • deepxiv token - Manage tokens

For frameworks without native skill support, you can load skills/deepxiv-cli/SKILL.md as system prompts or operating instructions.

Complete API Reference

Search and Query

reader.search(query, size=10, search_mode="hybrid", categories=None, min_citation=None)
reader.head(arxiv_id)              # Paper metadata and sections overview
reader.brief(arxiv_id)             # Quick summary (title, TLDR, keywords, citations)
reader.section(arxiv_id, section)  # Read specific section
reader.raw(arxiv_id)               # Full paper
reader.preview(arxiv_id)           # Paper preview (~10k characters)
reader.json(arxiv_id)              # Complete structured JSON

PMC (Biomedical Papers)

reader.pmc_head(pmc_id)            # PMC paper metadata
reader.pmc_full(pmc_id)            # Complete PMC paper JSON

Agent (Optional)

from deepxiv_sdk import Agent

agent = Agent(api_key="your_openai_key", model="gpt-4")
answer = agent.query("What are the latest papers about agent memory?")
print(answer)

Token Management

deepxiv supports 4 ways to configure tokens:

1. Auto-registration (Recommended) - Automatically creates and saves on first use

deepxiv search "agent"

2. Using config command

deepxiv config --token YOUR_TOKEN

3. Environment variable

export DEEPXIV_TOKEN="your_token"

4. Command-line option

deepxiv paper 2409.05591 --token YOUR_TOKEN

Increase daily limit: Default is 10,000 requests/day. For higher limits, email your name, email, and phone to tommy@chien.io.

Free Test Papers

These papers can be accessed without a token:

arXiv: 2409.05591, 2504.21776 PMC: PMC544940, PMC514704

MCP Tools

Available tools when using MCP Server:

Tool Description
search_papers Search arXiv papers
get_paper_brief Quick summary
get_paper_metadata Full metadata
get_paper_section Read specific section
get_full_paper Complete paper
get_paper_preview Paper preview
get_pmc_metadata PMC paper metadata
get_pmc_full Complete PMC paper

Agent Usage (Optional)

The built-in ReAct agent can automatically search papers, read content, and perform multi-turn reasoning:

from deepxiv_sdk import Agent

agent = Agent(
    api_key="your_deepseek_key",
    base_url="https://api.deepseek.com/v1",
    model="deepseek-chat"
)

answer = agent.query("Compare key ideas in transformers and attention mechanisms")
print(answer)

Or via CLI:

deepxiv agent config  # Configure LLM API
deepxiv agent query "What are the latest papers about agent memory?" --verbose

Error Handling

deepxiv provides specific exception types:

from deepxiv_sdk import (
    Reader,
    AuthenticationError,  # 401 - Invalid or expired token
    RateLimitError,       # 429 - Daily limit reached
    NotFoundError,        # 404 - Paper not found
    ServerError,          # 5xx - Server error
    APIError              # Other API errors
)

try:
    paper = reader.brief("2409.05591")
except AuthenticationError:
    print("Please update your token")
except RateLimitError:
    print("Daily limit reached")
except NotFoundError:
    print("Paper not found")
except APIError as e:
    print(f"API error: {e}")

Troubleshooting

Q: Do I need a token to use? A: No. Some papers are free to access. Search and some content require a token, but it's auto-created on first use.

Q: What's the maximum search results? A: 100 per request. Use offset parameter for pagination.

Q: How to handle timeouts? A: Reader automatically retries (max 3 times) with exponential backoff. You can customize:

reader = Reader(timeout=120, max_retries=5)

Q: Can I cache paper content? A: Yes. After getting content with reader, cache locally to database or file system.

Q: Which LLMs does the agent support? A: Any OpenAI-compatible API (OpenAI, DeepSeek, OpenRouter, local Ollama, etc.).

Examples

See examples/ directory:

  • quickstart.py - 5-minute quick start
  • example_reader.py - Basic Reader usage
  • example_agent.py - Agent usage
  • example_advanced.py - Advanced patterns
  • example_error_handling.py - Error handling examples

License

MIT License - see LICENSE file

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepxiv_sdk-0.2.2.tar.gz (43.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepxiv_sdk-0.2.2-py3-none-any.whl (39.7 kB view details)

Uploaded Python 3

File details

Details for the file deepxiv_sdk-0.2.2.tar.gz.

File metadata

  • Download URL: deepxiv_sdk-0.2.2.tar.gz
  • Upload date:
  • Size: 43.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for deepxiv_sdk-0.2.2.tar.gz
Algorithm Hash digest
SHA256 1a24b92dc09cd7c7374a4620e781fd3be5f1321fe9e96af4c277cea6ab78a490
MD5 c013fe00c55754462dacb2d1593cafc5
BLAKE2b-256 beccd65e40738415349d18f1a85d38dfc5a7beab92822826a38018bc4da8c65c

See more details on using hashes here.

File details

Details for the file deepxiv_sdk-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: deepxiv_sdk-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 39.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for deepxiv_sdk-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 09614dfdcdd6a4b8464d9878c5469fc26c791cb638846bfdc30b4bfb3cd21f92
MD5 eddac984c3a22217ecf99945e245ea99
BLAKE2b-256 a3da86eb2d3325a16d6633dbbfaa6e4af0a541bd2456f22d3b431afdcf62a2a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page