Skip to main content

A Python package for arXiv paper access with CLI and MCP server support

Project description

deepxiv-sdk

A Python SDK for accessing arXiv papers with CLI and MCP server support.

🎮 Try the live demo: https://1stauthor.com/

📚 API Documentation: https://data.rag.ac.cn/api/docs

Features

  • 🔍 Paper Search: Search for arXiv papers using hybrid search (BM25 + Vector)
  • 📄 Paper Access: Retrieve paper metadata, sections, and full content
  • 🏥 PMC Support: Access PubMed Central biomedical literature
  • 💻 CLI: Command-line interface for quick access
  • 🔌 MCP Server: Model Context Protocol server for Claude Desktop integration
  • 🤖 Intelligent Agent: ReAct-based agent for intelligent paper analysis
  • 🔌 Flexible LLM Support: Compatible with OpenAI, DeepSeek, OpenRouter, and other OpenAI-compatible APIs

Installation

# Basic install (Reader + CLI)
pip install deepxiv-sdk

# With MCP server support
pip install deepxiv-sdk[mcp]

# With Agent support (includes OpenAI SDK)
pip install deepxiv-sdk[agent]

# Full install (all features)
pip install deepxiv-sdk[all]

Note: Agent requires openai>=1.0.0 for LLM calls. Install with [agent] or [all] extras.

Quick Start

Step 1: Get Your Free API Token

Visit https://data.rag.ac.cn/register to get your free API token (10000 requests/day).

Step 2: Configure Your Token

# Interactive configuration (saves to ~/.env)
deepxiv config

# Or provide token directly
deepxiv config --token YOUR_TOKEN

# The CLI will automatically load token from ~/.env

CLI Usage

# Show help
deepxiv help

# Get paper in different formats
deepxiv paper 2409.05591                    # Full markdown
deepxiv paper 2409.05591 --head             # Metadata (JSON)
deepxiv paper 2409.05591 --brief            # Brief info (title, TLDR, keywords)
deepxiv paper 2409.05591 --raw              # Raw markdown
deepxiv paper 2409.05591 --preview          # Preview (~10k chars)
deepxiv paper 2409.05591 --section intro    # Specific section

# Search papers
deepxiv search "agent memory" --limit 5
deepxiv search "transformer" --mode bm25 --format json
deepxiv search "LLM" --categories cs.AI,cs.CL --min-citations 100

# Get PMC papers
deepxiv pmc PMC544940                       # Full JSON
deepxiv pmc PMC544940 --head                # Metadata only
deepxiv pmc PMC514704                       # Another example

# Intelligent Agent (requires agent installation)
deepxiv agent config                        # Configure LLM API first
deepxiv agent query "What are the latest papers about agent memory?"
# show reasoning process
deepxiv agent query "Latest HLE scores" --max-turn 10 --verbose 

# Start MCP server
deepxiv serve

Agent Configuration:

  • Config is saved to ~/.deepxiv_agent_config.json
  • Supports environment variables: DEEPXIV_AGENT_API_KEY, DEEPXIV_AGENT_BASE_URL, DEEPXIV_AGENT_MODEL
  • Compatible with OpenAI, DeepSeek, OpenRouter, and other OpenAI-compatible APIs

Python API

from deepxiv_sdk import Reader

# Initialize the reader
reader = Reader(token="your_api_token")  # or Reader() for free papers

# Search for papers
results = reader.search("agent memory", size=10)
for paper in results['results']:
    print(f"{paper['title']} - {paper['arxiv_id']}")

# Get paper metadata
head = reader.head("2409.05591")
print(f"Title: {head['title']}")

# Get brief info (quick summary)
brief = reader.brief("2409.05591")
print(f"Title: {brief['title']}")
print(f"TLDR: {brief.get('tldr', 'N/A')}")
print(f"Citations: {brief.get('citations', 0)}")

# Read a section (case-insensitive)
intro = reader.section("2409.05591", "Introduction")
print(intro)

# Get full paper
content = reader.raw("2409.05591")

# Access PMC papers
pmc_head = reader.pmc_head("PMC544940")
print(f"PMC Title: {pmc_head['title']}")

pmc_full = reader.pmc_json("PMC544940")
print(f"PMC Content: {len(str(pmc_full))} chars")

Agent Usage

The intelligent agent can search papers, read content, and answer questions using ReAct reasoning.

Python API

import os
from deepxiv_sdk import Reader, Agent

reader = Reader(token="your_api_token")
agent = Agent(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4",
    reader=reader,
    max_llm_calls=20,  # Maximum reasoning turns
    print_process=True  # Show reasoning steps
)

answer = agent.query("What are the latest papers about agent memory?")
print(answer)

# For DeepSeek or other APIs
agent = Agent(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com",
    model="deepseek-chat",
    reader=reader
)

MCP Server Setup

For Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "deepxiv": {
      "command": "deepxiv",
      "args": ["serve"],
      "env": {
        "DEEPXIV_TOKEN": "your_token_here"
      }
    }
  }
}

Available MCP Tools

Tool Description
search_papers Search arXiv with hybrid search
get_paper_brief Get brief info (title, TLDR, keywords, citations)
get_paper_metadata Get paper metadata and section TLDRs
get_paper_section Read a specific section
get_full_paper Get complete paper content
get_paper_preview Get preview (~10k chars)
get_pmc_metadata Get PMC paper metadata
get_pmc_full Get complete PMC paper in JSON

API Token

  • Get Your Free Token: https://data.rag.ac.cn/register
  • Daily Limit: 1000 free requests per day
  • Test Papers:
    • arXiv: 2409.05591 and 2504.21776 are available without authentication
    • PMC: PMC544940 and PMC514704 are available without authentication

Token Configuration (3 Ways)

1. Using config command (Recommended)

deepxiv config
# Saves to ~/.env and automatically loads on every command

2. Environment Variable

export DEEPXIV_TOKEN="your_token_here"
# Add to ~/.bashrc or ~/.zshrc for persistence

3. Command-line Option

deepxiv paper 2512.02556 --token "your_token_here"
# Useful for one-time usage or multiple tokens

The CLI automatically loads tokens from:

  1. Command-line --token option (highest priority)
  2. DEEPXIV_TOKEN environment variable
  3. .env file in current directory
  4. ~/.env file in home directory (lowest priority)

API Reference

Reader Methods

arXiv Methods

  • search(query, size=10, search_mode="hybrid", ...): Search for papers
  • head(arxiv_id): Get paper metadata and structure
  • brief(arxiv_id): Get brief info (title, TLDR, keywords, citations)
  • section(arxiv_id, section_name): Get a specific section (case-insensitive)
  • raw(arxiv_id): Get full paper in markdown
  • preview(arxiv_id): Get paper preview (~10k chars)
  • json(arxiv_id): Get complete structured JSON
  • markdown(arxiv_id): Get HTML view URL

PMC Methods

  • pmc_head(pmc_id): Get PMC paper metadata
  • pmc_json(pmc_id): Get complete PMC paper in JSON

Agent Methods

  • query(question, reset_papers=False): Query the agent with a question
  • get_loaded_papers(): Get information about loaded papers
  • reset_papers(): Clear all loaded papers from context
  • add_paper(arxiv_id): Manually add a paper to context

Agent Tools

The agent has access to the following tools:

Tool Description
search_papers Search arXiv papers with filters
load_paper Load paper metadata and structure
read_section Read a specific section
get_full_paper Get complete paper content
get_paper_preview Get paper preview (~10k chars)
quick_preview Batch preview multiple papers (brief info only)

Examples

See the examples directory:

  • example_reader.py: Basic Reader usage
  • example_agent.py: Agent usage
  • example_advanced.py: Advanced patterns
  • quickstart.py: Quick start guide

License

MIT License - see LICENSE file for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepxiv_sdk-0.1.1.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepxiv_sdk-0.1.1-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file deepxiv_sdk-0.1.1.tar.gz.

File metadata

  • Download URL: deepxiv_sdk-0.1.1.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for deepxiv_sdk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 73ded89e4e2c25eff10d44c5cca88a8ffda735a4bdd60eb82536a43c51a0f645
MD5 39a106ada124ce74ebd6d193b5f852cf
BLAKE2b-256 b28952e502123f96e9335726c3f3c717a7fdda140fa1818ed784ef7dc0928545

See more details on using hashes here.

File details

Details for the file deepxiv_sdk-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: deepxiv_sdk-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for deepxiv_sdk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5bb700c6822f304c77ae9c63158d0311346802fadd0541495f3fc9bdc819f0ec
MD5 432f394f440bb9d3d6b70290962a8fd1
BLAKE2b-256 7c47dd8a8a02359089f3e12ec2254e987e933c80613222040cbdd49ad9812677

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page