Skip to main content

Turn any PDF folder into a searchable MCP server

Project description

pdf2mcp

██████╗ ██████╗ ███████╗██████╗ ███╗   ███╗ ██████╗██████╗
██╔══██╗██╔══██╗██╔════╝╚════██╗████╗ ████║██╔════╝██╔══██╗
██████╔╝██║  ██║█████╗   █████╔╝██╔████╔██║██║     ██████╔╝
██╔═══╝ ██║  ██║██╔══╝  ██╔═══╝ ██║╚██╔╝██║██║     ██╔═══╝
██║     ██████╔╝██║     ███████╗██║ ╚═╝ ██║╚██████╗██║
╚═╝     ╚═════╝ ╚═╝     ╚══════╝╚═╝     ╚═╝ ╚═════╝╚═╝

PyPI License: MIT Python 3.10+

Turn any PDF folder into a searchable MCP server with semantic search.

Installation

From PyPI (recommended)

pip install pdf2mcp

Or with uv:

uv tool install pdf2mcp

From source

git clone https://github.com/iSamBa/pdf2mcp.git
uv tool install ./pdf2mcp

To update after pulling new changes:

uv tool install --force ./pdf2mcp

Verify

pdf2mcp --version

Quick Start

# 1. Scaffold a project (creates docs/ and .env)
pdf2mcp init ./my-project
cd my-project

# 2. Add your PDFs to docs/ and set OPENAI_API_KEY in .env

# 3. Ingest
pdf2mcp ingest

# 4. Start the server
pdf2mcp serve

# 5. Get config snippets for your MCP client
pdf2mcp config

Architecture

pdf2mcp separates server and client concerns:

  • Server (pdf2mcp serve) — runs independently, handles PDF ingestion, embedding, and search. Configured via PDF2MCP_* environment variables.
  • Client (Claude Code, Cursor, VS Code, etc.) — connects to a running server over HTTP. Only needs the server URL.

The default transport is streamable-http. The server listens on http://127.0.0.1:8000/mcp and shuts down gracefully on SIGINT/SIGTERM.

Commands

Command Description
pdf2mcp init [dir] Scaffold a working directory with docs/ and .env
pdf2mcp ingest Parse PDFs, chunk, embed, and store in vector DB
pdf2mcp serve Start the MCP server (HTTP by default)
pdf2mcp config Print ready-to-paste config for MCP clients

Common Flags

# Override docs directory
pdf2mcp ingest --docs-dir ./my-pdfs
pdf2mcp serve --docs-dir ./my-pdfs

# Use stdio transport (for clients that spawn the server)
pdf2mcp serve --transport stdio

# Custom host/port
pdf2mcp serve --host 0.0.0.0 --port 9000

# Custom server name
pdf2mcp serve --name my-docs

# Config for a specific client
pdf2mcp config --client cursor
pdf2mcp config --client claude-desktop --transport stdio

Client Configuration

pdf2mcp config generates ready-to-paste JSON for all supported clients. The default is HTTP — clients just need the server URL:

{
  "mcpServers": {
    "pdf-docs": {
      "type": "http",
      "url": "http://127.0.0.1:8000/mcp"
    }
  }
}
Client Config File Top-level Key HTTP Support
Claude Code .mcp.json mcpServers Yes
Claude Desktop claude_desktop_config.json mcpServers No (stdio only)
Cursor .cursor/mcp.json mcpServers Yes
VS Code / Copilot .vscode/mcp.json servers Yes

Use --transport stdio for clients that need to spawn the server process (e.g., Claude Desktop):

{
  "mcpServers": {
    "pdf-docs": {
      "command": "uv",
      "args": ["run", "pdf2mcp", "serve"]
    }
  }
}

Environment Variables

Server settings (PDF2MCP_*)

These configure the server process. MCP clients never need these.

Variable Default Description
OPENAI_API_KEY (required) OpenAI API key for embeddings
PDF2MCP_OPENAI_BASE_URL https://api.openai.com/v1 OpenAI API base URL (for Azure, local proxies, or compatible providers)
PDF2MCP_DOCS_DIR docs Directory containing PDF files
PDF2MCP_DATA_DIR data Directory for vector database
PDF2MCP_EMBEDDING_MODEL text-embedding-3-small OpenAI embedding model
PDF2MCP_CHUNK_SIZE 500 Target chunk size in tokens
PDF2MCP_CHUNK_OVERLAP 50 Overlap between chunks in tokens
PDF2MCP_DEFAULT_NUM_RESULTS 5 Default search results count
PDF2MCP_SERVER_NAME pdf-docs MCP server name
PDF2MCP_SERVER_TRANSPORT streamable-http Transport protocol
PDF2MCP_SERVER_HOST 127.0.0.1 Host to bind to
PDF2MCP_SERVER_PORT 8000 Port to bind to

Client settings (PDF2MCP_CLIENT_*)

These configure how a client connects to the server. No secrets needed.

Variable Default Description
PDF2MCP_CLIENT_SERVER_NAME pdf-docs Server name in client config
PDF2MCP_CLIENT_SERVER_URL http://127.0.0.1:8000/mcp Server URL
PDF2MCP_CLIENT_TRANSPORT streamable-http Transport protocol

MCP Tools

The server exposes six tools:

Tool Description
search_docs(query) Semantic search across all ingested PDFs
search_in_doc(query, filename) Semantic search scoped to a single document
list_docs() List all ingested documents with chunk counts
get_sections(filename) Get section headings for a specific document
read_page(filename, page) Read the full content of a specific page
read_section(filename, section_title) Read the full content of a named section

Typical workflow

  1. list_docs — discover available documents
  2. get_sections — browse a document's structure
  3. read_section or read_page — read specific content
  4. search_docs or search_in_doc — find information by query

Development

git clone https://github.com/iSamBa/pdf2mcp.git
cd pdf2mcp
uv sync --all-extras
uv run pytest
uv run ruff check src/
uv run mypy src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2mcp-0.2.3.tar.gz (132.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2mcp-0.2.3-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file pdf2mcp-0.2.3.tar.gz.

File metadata

  • Download URL: pdf2mcp-0.2.3.tar.gz
  • Upload date:
  • Size: 132.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdf2mcp-0.2.3.tar.gz
Algorithm Hash digest
SHA256 34fc85945cee157e59f54336c380a1d8027af8c79a31d977be41a9c9d14f5ed1
MD5 0f4b47e1a9703bbba1f0c30d9ab97d05
BLAKE2b-256 df8b974c3cf84951146cfa3f358f1d1ff813074ac928eb68e691ec6d106340cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf2mcp-0.2.3.tar.gz:

Publisher: publish.yml on iSamBa/pdf2mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pdf2mcp-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: pdf2mcp-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdf2mcp-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 16ab9d579c7a8ab479860e844806090d2ae5d9017acea3164b882ab3c05a26f1
MD5 be18b40cf12ec69fa2d614dabe400528
BLAKE2b-256 ee33ff9d0ec6589b9bd2452dba8b78a002aae805a4cc0f75fada27bf4144122a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf2mcp-0.2.3-py3-none-any.whl:

Publisher: publish.yml on iSamBa/pdf2mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page