MCP server for intelligent paper/PDF management with RAG capabilities

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

odedf

These details have not been verified by PyPI

Project description

Paper Intelligence MCP Server

A local MCP (Model Context Protocol) server for intelligent paper/PDF management with RAG capabilities.

Features

PDF to Markdown: Convert PDFs using Marker with high accuracy
Header Indexing: Extract document structure into searchable JSON
Semantic Search: RAG-powered search using LlamaIndex + ChromaDB + HuggingFace embeddings
Hybrid Search: Combined grep (text/regex) + semantic search
GPU Acceleration: MPS (Apple Silicon) and CUDA support
Self-contained: Each paper gets its own directory with all data
Version Tracking: Metadata tracks which version processed each paper

Installation

Option 1: Install from PyPI (Recommended)

# Install with pip
pip install paper-intelligence

# Or run directly with uvx (no install needed)
uvx paper-intelligence

Option 2: Install from GitHub

# Install directly from GitHub (no clone needed)
pip install "paper-intelligence @ git+https://github.com/Strand-AI/paper-intelligence.git"

Option 3: Local Development

git clone https://github.com/Strand-AI/paper-intelligence.git
cd paper-intelligence

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install in development mode
pip install -e ".[dev]"

# Run the server
python -m paper_intelligence.server

MCP Client Configuration

Claude Desktop

Add to your Claude Desktop config (~/.config/claude/claude_desktop_config.json on macOS/Linux or %APPDATA%\Claude\claude_desktop_config.json on Windows):

Using uvx (recommended after PyPI publish):

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence"]
    }
  }
}

Using local install:

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "/path/to/paper-intelligence/.venv/bin/python",
      "args": ["-m", "paper_intelligence.server"]
    }
  }
}

Claude Code

Add to your Claude Code config (~/.claude.json):

Using uvx (recommended after PyPI publish):

{
  "mcpServers": {
    "paper-intelligence": {
      "type": "stdio",
      "command": "uvx",
      "args": ["paper-intelligence"]
    }
  }
}

Using local install:

{
  "mcpServers": {
    "paper-intelligence": {
      "type": "stdio",
      "command": "/path/to/paper-intelligence/.venv/bin/python",
      "args": ["-m", "paper_intelligence.server"],
      "cwd": "/path/to/paper-intelligence"
    }
  }
}

Output Structure

For ~/Downloads/paper.pdf, creates ~/Downloads/paper/:

paper/
├── paper.md        # Converted markdown
├── metadata.json   # Processing version and info
├── index.json      # Header hierarchy (for search context)
├── chroma/         # Embeddings database
└── images/         # Extracted images (if any)

MCP Tools

`process_paper`

Full pipeline: Convert PDF, index headers, and create embeddings.

process_paper(
    pdf_path="~/Downloads/paper.pdf",
    use_llm=False,      # Set True for enhanced accuracy
    chunk_size=512,
    chunk_overlap=50
)
# Returns: output_dir, markdown_path, images_dir (if images extracted), image_count

`convert_pdf`

Convert a PDF file to Markdown.

convert_pdf(
    pdf_path="~/Downloads/paper.pdf",
    output_dir=None,  # Defaults to ~/Downloads/paper/
    use_llm=False
)
# Returns: markdown_path, images_dir (if images extracted), image_count

`index_markdown`

Extract header hierarchy into searchable JSON.

index_markdown(
    markdown_path="~/Downloads/paper/paper.md"
)

`embed_document`

Create embeddings for semantic search.

embed_document(
    markdown_path="~/Downloads/paper/paper.md",
    chunk_size=512,
    chunk_overlap=50
)

`search`

Unified search with grep and/or RAG.

search(
    query="transformer attention mechanism",
    paper_dirs=["~/Downloads/paper1", "~/Downloads/paper2"],
    mode="hybrid",  # "grep", "rag", or "hybrid"
    top_k=5
)

`get_paper_info`

Check processing status of a paper directory.

get_paper_info("~/Downloads/paper")
# Returns: has_markdown, has_index, has_embeddings, has_images,
#          images_dir, image_files, image_count,
#          version info, metadata

Extracted Images

When PDFs contain images (figures, diagrams, etc.), they are automatically extracted to an images/ subdirectory. The agent using this MCP server can:

Check get_paper_info() to see if images exist and get the images_dir path
Access individual image files listed in image_files
Reference images from the converted markdown (images are linked in the .md file)

Version Compatibility

Each processed paper directory includes a metadata.json file tracking:

paper_intelligence_version: Version used for processing
processed_at: Timestamp of processing
source_pdf: Original PDF filename
steps_completed: Which processing steps were run

When accessing papers, get_paper_info() checks version compatibility and warns if re-processing might be beneficial.

How Search Uses index.json

The index.json file stores the header hierarchy extracted from the markdown. When you search:

Grep search: Uses index.json to provide header context for matches (e.g., "Methods > Data Collection")
RAG search: Returns semantic matches from the embedded chunks

The index enables fast header lookups without re-parsing the markdown on each search.

Technical Stack

MCP: Official Python SDK with FastMCP
PDF Conversion: marker-pdf
Embeddings: LlamaIndex + HuggingFace (BAAI/bge-small-en-v1.5)
Vector Store: ChromaDB (persistent, local per-paper)
GPU: PyTorch with MPS (Apple Silicon) or CUDA support

Development

pip install -e ".[dev]"
pytest

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

odedf

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.1

May 7, 2026

0.4.0

Apr 15, 2026

0.3.0

Jan 3, 2026

0.2.0

Jan 2, 2026

0.1.1

Dec 31, 2025

This version

0.1.0

Dec 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_intelligence-0.1.0.tar.gz (19.3 kB view details)

Uploaded Dec 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paper_intelligence-0.1.0-py3-none-any.whl (23.3 kB view details)

Uploaded Dec 31, 2025 Python 3

File details

Details for the file paper_intelligence-0.1.0.tar.gz.

File metadata

Download URL: paper_intelligence-0.1.0.tar.gz
Upload date: Dec 31, 2025
Size: 19.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paper_intelligence-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1dbd46adea31f28f37ba7e99f1c982f9eda2a1cae3a021977c7cb116c5cca070`
MD5	`172a655b25a83881db30c24a9ba5c45b`
BLAKE2b-256	`d8e89cac2e1f1de4ed656d000ff2620455cc89d7f95d3634298d3268eb215b35`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paper_intelligence-0.1.0.tar.gz:

Publisher: publish.yml on Strand-AI/paper-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paper_intelligence-0.1.0.tar.gz
- Subject digest: 1dbd46adea31f28f37ba7e99f1c982f9eda2a1cae3a021977c7cb116c5cca070
- Sigstore transparency entry: 786060047
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: Strand-AI/paper-intelligence@2eede5e92e31b8440cc8e54d48f54858373aa6b7
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Strand-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2eede5e92e31b8440cc8e54d48f54858373aa6b7
- Trigger Event: push

File details

Details for the file paper_intelligence-0.1.0-py3-none-any.whl.

File metadata

Download URL: paper_intelligence-0.1.0-py3-none-any.whl
Upload date: Dec 31, 2025
Size: 23.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paper_intelligence-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c19f321971a1fe0b5ab6627e4f59b9d74214d15b2ed1dee612852a72aee562ce`
MD5	`9a45a4a202d66b08ec9788711269686d`
BLAKE2b-256	`5ae3307f9142b701e428658f042d32d5846e98d8ae0fa3600037cea19f539067`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paper_intelligence-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Strand-AI/paper-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paper_intelligence-0.1.0-py3-none-any.whl
- Subject digest: c19f321971a1fe0b5ab6627e4f59b9d74214d15b2ed1dee612852a72aee562ce
- Sigstore transparency entry: 786060052
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: Strand-AI/paper-intelligence@2eede5e92e31b8440cc8e54d48f54858373aa6b7
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Strand-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2eede5e92e31b8440cc8e54d48f54858373aa6b7
- Trigger Event: push

paper-intelligence 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Paper Intelligence MCP Server

Features

Installation

Option 1: Install from PyPI (Recommended)

Option 2: Install from GitHub

Option 3: Local Development

MCP Client Configuration

Claude Desktop

Claude Code

Output Structure

MCP Tools

process_paper

convert_pdf

index_markdown

embed_document

search

get_paper_info

Extracted Images

Version Compatibility

How Search Uses index.json

Technical Stack

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`process_paper`

`convert_pdf`

`index_markdown`

`embed_document`

`search`

`get_paper_info`