Skip to main content

A Python package for arXiv paper access with CLI and MCP server support

Project description

deepxiv-sdk

A Python SDK for accessing arXiv papers with CLI and MCP server support.

🎮 Try the live demo: https://1stauthor.com/

📚 API Documentation: https://data.rag.ac.cn/api/docs

Features

  • 🔍 Paper Search: Search for arXiv papers using hybrid search (BM25 + Vector)
  • 📄 Paper Access: Retrieve paper metadata, sections, and full content
  • 🏥 PMC Support: Access PubMed Central biomedical literature
  • 💻 CLI: Command-line interface for quick access
  • 🔌 MCP Server: Model Context Protocol server for Claude Desktop integration
  • 🤖 Intelligent Agent: ReAct-based agent for intelligent paper analysis
  • 🔌 Flexible LLM Support: Compatible with OpenAI, DeepSeek, OpenRouter, and other OpenAI-compatible APIs

Installation

# Basic install (Reader + CLI)
pip install deepxiv-sdk

# With MCP server support
pip install deepxiv-sdk[mcp]

# With Agent support
pip install deepxiv-sdk[agent]

# Full install (all features)
pip install deepxiv-sdk[all]

Quick Start

Step 1: Get Your Free API Token

Visit https://data.rag.ac.cn/register to get your free API token (1000 requests/day).

Step 2: Configure Your Token

# Interactive configuration (saves to ~/.env)
deepxiv config

# Or provide token directly
deepxiv config --token YOUR_TOKEN

# The CLI will automatically load token from ~/.env

CLI Usage

# Show help
deepxiv help

# Get paper in different formats
deepxiv paper 2409.05591                    # Full markdown
deepxiv paper 2409.05591 --head             # Metadata (JSON)
deepxiv paper 2409.05591 --brief            # Brief info (title, TLDR, keywords)
deepxiv paper 2409.05591 --raw              # Raw markdown
deepxiv paper 2409.05591 --preview          # Preview (~10k chars)
deepxiv paper 2409.05591 --section intro    # Specific section

# Search papers
deepxiv search "agent memory" --limit 5
deepxiv search "transformer" --mode bm25 --format json
deepxiv search "LLM" --categories cs.AI,cs.CL --min-citations 100

# Get PMC papers
deepxiv pmc PMC544940                       # Full JSON
deepxiv pmc PMC544940 --head                # Metadata only
deepxiv pmc PMC514704                       # Another example

# Start MCP server
deepxiv serve

Python API

from deepxiv_sdk import Reader

# Initialize the reader
reader = Reader(token="your_api_token")  # or Reader() for free papers

# Search for papers
results = reader.search("agent memory", size=10)
for paper in results['results']:
    print(f"{paper['title']} - {paper['arxiv_id']}")

# Get paper metadata
head = reader.head("2409.05591")
print(f"Title: {head['title']}")

# Get brief info (quick summary)
brief = reader.brief("2409.05591")
print(f"Title: {brief['title']}")
print(f"TLDR: {brief.get('tldr', 'N/A')}")
print(f"Citations: {brief.get('citations', 0)}")

# Read a section (case-insensitive)
intro = reader.section("2409.05591", "Introduction")
print(intro)

# Get full paper
content = reader.raw("2409.05591")

# Access PMC papers
pmc_head = reader.pmc_head("PMC544940")
print(f"PMC Title: {pmc_head['title']}")

pmc_full = reader.pmc_json("PMC544940")
print(f"PMC Content: {len(str(pmc_full))} chars")

Agent Usage

import os
from deepxiv_sdk import Reader, Agent

reader = Reader(token="your_api_token")
agent = Agent(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4",
    reader=reader,
    print_process=True
)

answer = agent.query("What are the latest papers about agent memory?")
print(answer)

MCP Server Setup

For Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "deepxiv": {
      "command": "deepxiv",
      "args": ["serve"],
      "env": {
        "DEEPXIV_TOKEN": "your_token_here"
      }
    }
  }
}

Available MCP Tools

Tool Description
search_papers Search arXiv with hybrid search
get_paper_brief Get brief info (title, TLDR, keywords, citations)
get_paper_metadata Get paper metadata and section TLDRs
get_paper_section Read a specific section
get_full_paper Get complete paper content
get_paper_preview Get preview (~10k chars)
get_pmc_metadata Get PMC paper metadata
get_pmc_full Get complete PMC paper in JSON

API Token

  • Get Your Free Token: https://data.rag.ac.cn/register
  • Daily Limit: 1000 free requests per day
  • Test Papers:
    • arXiv: 2409.05591 and 2504.21776 are available without authentication
    • PMC: PMC544940 and PMC514704 are available without authentication

Token Configuration (3 Ways)

1. Using config command (Recommended)

deepxiv config
# Saves to ~/.env and automatically loads on every command

2. Environment Variable

export DEEPXIV_TOKEN="your_token_here"
# Add to ~/.bashrc or ~/.zshrc for persistence

3. Command-line Option

deepxiv paper 2512.02556 --token "your_token_here"
# Useful for one-time usage or multiple tokens

The CLI automatically loads tokens from:

  1. Command-line --token option (highest priority)
  2. DEEPXIV_TOKEN environment variable
  3. .env file in current directory
  4. ~/.env file in home directory (lowest priority)

API Reference

Reader Methods

arXiv Methods

  • search(query, size=10, search_mode="hybrid", ...): Search for papers
  • head(arxiv_id): Get paper metadata and structure
  • brief(arxiv_id): Get brief info (title, TLDR, keywords, citations)
  • section(arxiv_id, section_name): Get a specific section (case-insensitive)
  • raw(arxiv_id): Get full paper in markdown
  • preview(arxiv_id): Get paper preview (~10k chars)
  • json(arxiv_id): Get complete structured JSON
  • markdown(arxiv_id): Get HTML view URL

PMC Methods

  • pmc_head(pmc_id): Get PMC paper metadata
  • pmc_json(pmc_id): Get complete PMC paper in JSON

Agent Methods

  • query(question, reset_papers=False): Query the agent
  • get_loaded_papers(): Get loaded papers info
  • reset_papers(): Reset all loaded papers
  • add_paper(arxiv_id): Add a paper to context

Examples

See the examples directory:

  • example_reader.py: Basic Reader usage
  • example_agent.py: Agent usage
  • example_advanced.py: Advanced patterns
  • quickstart.py: Quick start guide

License

MIT License - see LICENSE file for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepxiv_sdk-0.1.0.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepxiv_sdk-0.1.0-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file deepxiv_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: deepxiv_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for deepxiv_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b208573d112a40cb124a555c913e49013caca99ac87b508e5373cd53a27592cd
MD5 9eea6fad4bc76e387cfd608646f32ba5
BLAKE2b-256 a5852fefecb861db3287507b46a4d5130b8fde8ec156b23f2e5d31da1383a5c4

See more details on using hashes here.

File details

Details for the file deepxiv_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: deepxiv_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for deepxiv_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2baa98a3bc7456b856bd2e6a767e28edd937ed699a656dc8b27ca1cad25579bb
MD5 23baad938e05cb3b170220db75eeb528
BLAKE2b-256 2f4294f52f31e21f5f950f9430e96ef3dde57d8769d751c9046efb9de1c07bfe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page