MCP server enabling AI agents to efficiently search and understand PDF documents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

odedf

These details have not been verified by PyPI

Project description

Paper Intelligence MCP Server

A local MCP (Model Context Protocol) server for intelligent paper/PDF management with RAG capabilities.

Quick Start

Claude Code CLI:

claude mcp add paper-intelligence -- uvx paper-intelligence

VS Code:

code --add-mcp '{"name":"paper-intelligence","command":"uvx","args":["paper-intelligence"]}'

Features

PDF to Markdown: Convert PDFs using Marker with high accuracy
Header Indexing: Extract document structure into searchable JSON
Semantic Search: RAG-powered search using LlamaIndex + ChromaDB + HuggingFace embeddings
Hybrid Search: Combined grep (text/regex) + semantic search
GPU Acceleration: MPS (Apple Silicon) and CUDA support
Self-contained: Each paper gets its own directory with all data
Version Tracking: Metadata tracks which version processed each paper

Installation

Option 1: Install from PyPI (Recommended)

# Install with pip
pip install paper-intelligence

# Or run directly with uvx (no install needed)
uvx paper-intelligence

Option 2: Install from GitHub

# Install directly from GitHub (no clone needed)
pip install "paper-intelligence @ git+https://github.com/Strand-AI/paper-intelligence.git"

Option 3: Local Development

git clone https://github.com/Strand-AI/paper-intelligence.git
cd paper-intelligence

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install in development mode
pip install -e ".[dev]"

# Run the server
python -m paper_intelligence.server

MCP Client Configuration

Claude Code CLI

The easiest way to add the server:

claude mcp add paper-intelligence -- uvx paper-intelligence

Verify installation:

claude mcp list

Claude Desktop

Add to your Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence"]
    }
  }
}

VS Code

One-liner install:

code --add-mcp '{"name":"paper-intelligence","command":"uvx","args":["paper-intelligence"]}'

Or manually add to your User Settings (JSON) or .vscode/mcp.json:

{
  "mcp": {
    "servers": {
      "paper-intelligence": {
        "command": "uvx",
        "args": ["paper-intelligence"]
      }
    }
  }
}

Cursor

Go to Settings → MCP → Add new MCP Server
Select command type
Enter: uvx paper-intelligence

Or add to your Cursor MCP config:

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence"]
    }
  }
}

Windsurf

Add to your Windsurf MCP configuration:

{
  "mcpServers": {
    "paper-intelligence": {
      "command": "uvx",
      "args": ["paper-intelligence"]
    }
  }
}

Output Structure

For ~/Downloads/paper.pdf, creates ~/Downloads/paper/:

paper/
├── paper.md        # Converted markdown
├── metadata.json   # Processing version and info
├── index.json      # Header hierarchy (for search context)
├── chroma/         # Embeddings database
└── images/         # Extracted images (if any)

MCP Tools

`process_paper`

Full pipeline: Convert PDF, index headers, and create embeddings.

process_paper(
    pdf_path="~/Downloads/paper.pdf",
    use_llm=False,      # Set True for enhanced accuracy
    chunk_size=512,
    chunk_overlap=50
)
# Returns: output_dir, markdown_path, images_dir (if images extracted), image_count

`convert_pdf`

Convert a PDF file to Markdown.

convert_pdf(
    pdf_path="~/Downloads/paper.pdf",
    output_dir=None,  # Defaults to ~/Downloads/paper/
    use_llm=False
)
# Returns: markdown_path, images_dir (if images extracted), image_count

`index_markdown`

Extract header hierarchy into searchable JSON.

index_markdown(
    markdown_path="~/Downloads/paper/paper.md"
)

`embed_document`

Create embeddings for semantic search.

embed_document(
    markdown_path="~/Downloads/paper/paper.md",
    chunk_size=512,
    chunk_overlap=50
)

`search`

Unified search with grep and/or RAG.

search(
    query="transformer attention mechanism",
    paper_dirs=["~/Downloads/paper1", "~/Downloads/paper2"],
    mode="hybrid",  # "grep", "rag", or "hybrid"
    top_k=5
)

`get_paper_info`

Check processing status of a paper directory.

get_paper_info("~/Downloads/paper")
# Returns: has_markdown, has_index, has_embeddings, has_images,
#          images_dir, image_files, image_count,
#          version info, metadata

Extracted Images

When PDFs contain images (figures, diagrams, etc.), they are automatically extracted to an images/ subdirectory. The agent using this MCP server can:

Check get_paper_info() to see if images exist and get the images_dir path
Access individual image files listed in image_files
Reference images from the converted markdown (images are linked in the .md file)

Version Compatibility

Each processed paper directory includes a metadata.json file tracking:

paper_intelligence_version: Version used for processing
processed_at: Timestamp of processing
source_pdf: Original PDF filename
steps_completed: Which processing steps were run

When accessing papers, get_paper_info() checks version compatibility and warns if re-processing might be beneficial.

How Search Uses index.json

The index.json file stores the header hierarchy extracted from the markdown. When you search:

Grep search: Uses index.json to provide header context for matches (e.g., "Methods > Data Collection")
RAG search: Returns semantic matches from the embedded chunks

The index enables fast header lookups without re-parsing the markdown on each search.

Technical Stack

MCP: Official Python SDK with FastMCP
PDF Conversion: marker-pdf
Embeddings: LlamaIndex + HuggingFace (BAAI/bge-small-en-v1.5)
Vector Store: ChromaDB (persistent, local per-paper)
GPU: PyTorch with MPS (Apple Silicon) or CUDA support

Development

pip install -e ".[dev]"

# Run unit tests (fast)
pytest tests/test_markdown_parser.py

# Run integration tests (slow, requires ML models)
pytest tests/test_integration.py -v

To use your local development version with MCP clients, replace uvx paper-intelligence with:

python -m paper_intelligence.server

Debugging

Use the MCP Inspector to debug the server:

npx @modelcontextprotocol/inspector uvx paper-intelligence

Troubleshooting

Server not starting?

Ensure Python 3.11+ is installed
Try uvx paper-intelligence directly to see error messages
Check that all dependencies installed correctly

Windows encoding issues? Add to your config:

"env": {
  "PYTHONIOENCODING": "utf-8"
}

Claude Desktop not detecting changes? Claude Desktop only reads configuration on startup. Fully restart the app after config changes.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

odedf

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.1

May 7, 2026

0.4.0

Apr 15, 2026

0.3.0

Jan 3, 2026

0.2.0

Jan 2, 2026

This version

0.1.1

Dec 31, 2025

0.1.0

Dec 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_intelligence-0.1.1.tar.gz (735.8 kB view details)

Uploaded Dec 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paper_intelligence-0.1.1-py3-none-any.whl (24.1 kB view details)

Uploaded Dec 31, 2025 Python 3

File details

Details for the file paper_intelligence-0.1.1.tar.gz.

File metadata

Download URL: paper_intelligence-0.1.1.tar.gz
Upload date: Dec 31, 2025
Size: 735.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paper_intelligence-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`27098a811f8681992a706a464d667bb43cd82ad63a5a95e706c45567d507c1b5`
MD5	`85f40a55272e5859902fff4ae0c55658`
BLAKE2b-256	`746df42ed7921613bbf76b45ddc9ae22a9059cb1fe879250134473affc14ace7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paper_intelligence-0.1.1.tar.gz:

Publisher: publish.yml on Strand-AI/paper-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paper_intelligence-0.1.1.tar.gz
- Subject digest: 27098a811f8681992a706a464d667bb43cd82ad63a5a95e706c45567d507c1b5
- Sigstore transparency entry: 786248480
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: Strand-AI/paper-intelligence@f684f331b3f45530a9ae8c69c38e7bf645d32bdc
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/Strand-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f684f331b3f45530a9ae8c69c38e7bf645d32bdc
- Trigger Event: push

File details

Details for the file paper_intelligence-0.1.1-py3-none-any.whl.

File metadata

Download URL: paper_intelligence-0.1.1-py3-none-any.whl
Upload date: Dec 31, 2025
Size: 24.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for paper_intelligence-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44c60bba70951c27d286c5fadecdf1716a984ccd1ae94bced86c963c68d2e91f`
MD5	`2df239433450f268e5e2414330cfb5a8`
BLAKE2b-256	`ef065aa2a5acd410e258f85f675b14ac578b4493b0ee99e44c98c69da505c006`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paper_intelligence-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Strand-AI/paper-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paper_intelligence-0.1.1-py3-none-any.whl
- Subject digest: 44c60bba70951c27d286c5fadecdf1716a984ccd1ae94bced86c963c68d2e91f
- Sigstore transparency entry: 786248486
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: Strand-AI/paper-intelligence@f684f331b3f45530a9ae8c69c38e7bf645d32bdc
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/Strand-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f684f331b3f45530a9ae8c69c38e7bf645d32bdc
- Trigger Event: push

paper-intelligence 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Paper Intelligence MCP Server

Quick Start

Features

Installation

Option 1: Install from PyPI (Recommended)

Option 2: Install from GitHub

Option 3: Local Development

MCP Client Configuration

Claude Code CLI

Claude Desktop

VS Code

Cursor

Windsurf

Output Structure

MCP Tools

process_paper

convert_pdf

index_markdown

embed_document

search

get_paper_info

Extracted Images

Version Compatibility

How Search Uses index.json

Technical Stack

Development

Debugging

Troubleshooting

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`process_paper`

`convert_pdf`

`index_markdown`

`embed_document`

`search`

`get_paper_info`