Skip to main content

A Python package for arXiv paper access with CLI and MCP server support

Project description

deepxiv-sdk

Agent-first academic paper interface for CLI, MCP, and Python. deepxiv gives OpenClaw, Claude Code, Codex, and other coding agents a fast, structured way to search papers, inspect metadata, read only the right sections, and reason over open-access literature without wasting tokens.

Why deepxiv for agents?

Feature deepxiv Standard arXiv API
Hybrid Search (BM25 + Vector)
AI-Generated Summaries (TLDR)
Section-by-Section Access
GitHub Link Extraction
MCP Protocol Support
Biomedical Papers (PMC)
Agent-Oriented CLI
Free Daily Requests 10,000 ∞*

*arXiv API has no limit, but strict rate limiting

Core Features

  • 🔍 Hybrid Search: BM25 + vector search for better retrieval quality
  • 📄 Section-Based Access: load only the sections an agent actually needs
  • Brief Views: title, TLDR, keywords, citations, PDF, and GitHub link when available
  • 💻 Three Interfaces: CLI / MCP Server / Python SDK
  • 🤖 Agent-Friendly by Default: works well inside OpenClaw, Claude Code, Codex, and similar agent loops
  • 📚 PMC Support: access biomedical literature alongside arXiv
  • 🔥 Trending + Social Impact: discover papers getting attention online

Agent Integration

deepxiv is designed to be the paper interface layer for coding and research agents.

  • Codex: install the CLI skill and let Codex call deepxiv search, deepxiv paper, and deepxiv pmc directly
  • Claude Code: load the same CLI skill or use the MCP server for tool-based access
  • OpenClaw: use the CLI as a stable shell interface, or wire the MCP server into your agent runtime
  • Other agents: use the CLI for predictable terminal workflows, the MCP server for tool calling, or the Python SDK for direct integration

The key design goal is simple: give agents a comprehensive and token-efficient academic paper interface instead of forcing them to scrape raw PDFs or overfetch entire papers.

🌐 Open Access Literature Support

Current Support

  • arXiv - Computer Science, Physics, Math, and more
  • PubMed Central (PMC) - Biomedical and life sciences

Coming Soon (Roadmap)

  • 🔄 bioRxiv - Preprints in biology
  • 🔄 medRxiv - Preprints in medicine
  • 🔄 Other OA Sources - Additional open access repositories
  • 🔄 Full OA Literature Coverage - Comprehensive open access ecosystem

Why OA Literature? By focusing on open access papers, deepxiv ensures that researchers and AI systems have unrestricted access to knowledge without subscription barriers.

Quick Start

1. Installation

# Basic install (Reader + CLI)
pip install deepxiv-sdk

# Full install (MCP + Agent)
pip install deepxiv-sdk[all]

2. First Use

On first use, deepxiv automatically registers a free token and saves it to ~/.env:

deepxiv search "agent memory" --limit 5

3. CLI Usage

The CLI is the fastest way to plug deepxiv into agent workflows.

# Search papers
deepxiv search "transformer" --limit 10

# Quick paper understanding
deepxiv paper 2409.05591 --brief

# Paper structure and targeted reading
deepxiv paper 2409.05591 --head
deepxiv paper 2409.05591 --section Introduction
deepxiv paper 2409.05591 --preview
deepxiv paper 2409.05591

# Social/trending signals
deepxiv paper 2409.05591 --popularity
deepxiv trending --days 14 --limit 10

# Biomedical papers
deepxiv pmc PMC544940 --head

4. Use with OpenClaw, Claude Code, and Codex

Codex skill

mkdir -p $CODEX_HOME/skills
ln -s "$(pwd)/skills/deepxiv-cli" $CODEX_HOME/skills/deepxiv-cli

The included skill teaches agents when to use:

  • deepxiv search for literature discovery
  • deepxiv paper --brief for quick filtering
  • deepxiv paper --section for focused reading
  • deepxiv pmc for biomedical papers
  • deepxiv agent for deeper multi-turn reasoning

Claude Code / OpenClaw / custom agents

If your framework supports reusable operating instructions, load skills/deepxiv-cli/SKILL.md directly. This gives agents a clean command selection guide instead of relying on ad hoc shell usage.

5. MCP Server

Use MCP when you want tool-based integration rather than shell execution.

Add to Claude Desktop MCP config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Windows: %APPDATA%\Claude\claude_desktop_config.json

Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "deepxiv": {
      "command": "deepxiv",
      "args": ["serve"],
      "env": {
        "DEEPXIV_TOKEN": "your_token_here"
      }
    }
  }
}

Available MCP tools:

Tool Description
search_papers Search arXiv papers
get_paper_brief Quick summary
get_paper_metadata Full metadata
get_paper_section Read specific section
get_full_paper Complete paper
get_paper_preview Paper preview
get_pmc_metadata PMC paper metadata
get_pmc_full Complete PMC paper

6. Python Usage

from deepxiv_sdk import Reader

reader = Reader()

# Search papers
results = reader.search("agent memory", size=5)
for paper in results.get("results", []):
    print(f"{paper['title']} ({paper['arxiv_id']})")

# Get paper info
brief = reader.brief("2409.05591")
print(f"Title: {brief['title']}")
print(f"TLDR: {brief.get('tldr', 'N/A')}")
print(f"GitHub: {brief.get('github_url', 'N/A')}")

# Read specific section
intro = reader.section("2409.05591", "Introduction")
print(intro[:500])

# Get trending papers (no token required)
trending = reader.trending(days=7, limit=5)
for paper in trending['papers']:
    print(f"#{paper['rank']}: {paper['arxiv_id']}")
    print(f"  Views: {paper['stats']['total_views']}")

# Get social impact metrics (requires token)
reader_with_token = Reader(token="your_token_here")
impact = reader_with_token.social_impact("2409.05591")
if impact:
    print(f"Views: {impact['total_views']}")
    print(f"Tweets: {impact['total_tweets']}")

Complete API Reference

Search and Query

reader.search(query, size=10, search_mode="hybrid", categories=None, min_citation=None)
reader.head(arxiv_id)              # Paper metadata and sections overview
reader.brief(arxiv_id)             # Quick summary (title, TLDR, keywords, citations, GitHub URL)
reader.section(arxiv_id, section)  # Read specific section
reader.raw(arxiv_id)               # Full paper
reader.preview(arxiv_id)           # Paper preview (~10k characters)
reader.json(arxiv_id)              # Complete structured JSON

PMC (Biomedical Papers)

reader.pmc_head(pmc_id)            # PMC paper metadata
reader.pmc_full(pmc_id)            # Complete PMC paper JSON

Agent (Optional)

from deepxiv_sdk import Agent

agent = Agent(api_key="your_openai_key", model="gpt-4")
answer = agent.query("What are the latest papers about agent memory?")
print(answer)

Token Management

deepxiv supports 4 ways to configure tokens:

1. Auto-registration (Recommended) - Automatically creates and saves on first use

deepxiv search "agent"

2. Using config command

deepxiv config --token YOUR_TOKEN

3. Environment variable

export DEEPXIV_TOKEN="your_token"

4. Command-line option

deepxiv paper 2409.05591 --token YOUR_TOKEN

Increase daily limit: Default is 10,000 requests/day. For higher limits, email your name, email, and phone to tommy@chien.io.

Free Test Papers

These papers can be accessed without a token:

arXiv: 2409.05591, 2504.21776 PMC: PMC544940, PMC514704

Agent Usage (Optional)

The built-in ReAct agent can automatically search papers, read content, and perform multi-turn reasoning:

from deepxiv_sdk import Agent

agent = Agent(
    api_key="your_deepseek_key",
    base_url="https://api.deepseek.com/v1",
    model="deepseek-chat"
)

answer = agent.query("Compare key ideas in transformers and attention mechanisms")
print(answer)

Or via CLI:

deepxiv agent config  # Configure LLM API
deepxiv agent query "What are the latest papers about agent memory?" --verbose

Error Handling

deepxiv provides specific exception types:

from deepxiv_sdk import (
    Reader,
    AuthenticationError,  # 401 - Invalid or expired token
    RateLimitError,       # 429 - Daily limit reached
    NotFoundError,        # 404 - Paper not found
    ServerError,          # 5xx - Server error
    APIError              # Other API errors
)

try:
    paper = reader.brief("2409.05591")
except AuthenticationError:
    print("Please update your token")
except RateLimitError:
    print("Daily limit reached")
except NotFoundError:
    print("Paper not found")
except APIError as e:
    print(f"API error: {e}")

Troubleshooting

Q: Do I need a token to use? A: No. Some papers are free to access. Search and some content require a token, but it's auto-created on first use.

Q: What's the maximum search results? A: 100 per request. Use offset parameter for pagination.

Q: How to handle timeouts? A: Reader automatically retries (max 3 times) with exponential backoff. You can customize:

reader = Reader(timeout=120, max_retries=5)

Q: Can I cache paper content? A: Yes. After getting content with reader, cache locally to database or file system.

Q: Which LLMs does the agent support? A: Any OpenAI-compatible API (OpenAI, DeepSeek, OpenRouter, local Ollama, etc.).

Examples

See examples/ directory:

  • quickstart.py - 5-minute quick start
  • example_reader.py - Basic Reader usage
  • example_agent.py - Agent usage
  • example_advanced.py - Advanced patterns
  • example_error_handling.py - Error handling examples

License

MIT License - see LICENSE file

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepxiv_sdk-0.2.3.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepxiv_sdk-0.2.3-py3-none-any.whl (42.4 kB view details)

Uploaded Python 3

File details

Details for the file deepxiv_sdk-0.2.3.tar.gz.

File metadata

  • Download URL: deepxiv_sdk-0.2.3.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for deepxiv_sdk-0.2.3.tar.gz
Algorithm Hash digest
SHA256 e926668f2dab9c65366eea0ca9cb61939626cb23baf328382f2955defb52ad26
MD5 d0ac34608064f51fa59c260770d94782
BLAKE2b-256 82710a4fa9847cbacf5f879a32d512be1d8fcbaa19ef72d1a52b567215553cf2

See more details on using hashes here.

File details

Details for the file deepxiv_sdk-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: deepxiv_sdk-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 42.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for deepxiv_sdk-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 924b38fc3942414d1863acf9008b5fec82889b7e291b74d8c571e4d13c87b24d
MD5 1b1b3dcc5959c7e405f44ef45cd4f994
BLAKE2b-256 a75c287b065413eab619a90ba031616ffbc825a7e396733658107bed3e057b8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page