MCP server for extracting text, images, tables, links, annotations, and metadata from PDF files

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

pdf-reader-mcp

一个功能丰富的 PDF 阅读 MCP 服务器，让 LLM（大语言模型）客户端能够读取和分析 PDF 文件。
A feature-rich MCP server for reading and analyzing PDF files with LLM clients.

功能特性 / Features

工具 / Tool	中文说明	English
`get_pdf_info`	读取文档元数据、页数、大小和加密状态	Read document metadata, page count, size, and encryption status
`read_pdf_as_text`	提取指定页面文本内容	Extract text content from selected pages
`read_pdf_as_images`	将指定页面渲染为 base64 图片	Render selected pages as base64-encoded images
`get_pdf_outline`	读取书签与目录结构	Read bookmarks and outline structure
`search_pdf_text`	按页返回搜索结果和上下文	Search text with per-page context
`extract_pdf_tables`	提取可识别的表格结构	Extract structured tables when detectable
`extract_pdf_images`	提取 PDF 内嵌图片	Extract embedded images from the PDF
`get_pdf_page_info`	查看单页尺寸、文本、图片和链接信息	Inspect a page's dimensions, text, images, and links
`extract_pdf_links`	提取外部链接和内部跳转	Extract external URLs and internal page jumps
`get_pdf_annotations`	读取批注、高亮与注释信息	Read comments, highlights, and annotation data
`get_pdf_text_stats`	统计文本、行数、段落数和扫描版概率	Compute text, line, paragraph, and scan-likelihood stats
`compare_pdf_pages`	比较两个页面的文本相似度	Compare text similarity between two pages

为什么做这个项目 / Why this project

很多 LLM 工作流不仅需要纯文本提取，还需要目录、表格、图片、注释、链接等结构化信息。
Many LLM workflows need more than raw text extraction. They also need structure, tables, images, annotations, and links.

这个服务提供统一的 MCP 接口，用于： This server provides a unified MCP interface for:

文本型 PDF / text-heavy PDFs
扫描版或版式敏感 PDF / scanned or layout-sensitive PDFs
表格与图片提取 / table and image extraction
元数据与结构分析 / metadata and structure inspection
批注与链接分析 / annotation and link analysis

安装 / Installation

前置要求 / Prerequisites

Python 3.10+
uv 或其他 Python 环境管理工具 / uv or another Python environment manager

安装 uv / Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows PowerShell:

irm https://astral.sh/uv/install.ps1 | iex

从 PyPI 安装 / Install from PyPI

发布后可直接通过 uvx 运行： After the package is published, you can run it directly with uvx:

uvx pdf-insight-mcp

也可以先安装再运行： You can also install first, then run:

python -m pip install pdf-insight-mcp
pdf-reader-mcp

本地开发安装 / Local development setup

uv sync

运行服务 / Run the server

uv run pdf-reader-mcp

在 MCP 客户端中配置 / Configure in an MCP client

PyPI 安装方式示例 / Example config using the published PyPI package:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uvx",
      "args": ["pdf-insight-mcp"]
    }
  }
}

本地仓库开发配置示例 / Example configuration for a local checkout:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/pdf-reader-mcp",
        "run",
        "pdf-reader-mcp"
      ]
    }
  }
}

将 /absolute/path/to/pdf-reader-mcp 替换为你的本地仓库路径。
Replace /absolute/path/to/pdf-reader-mcp with your local repository path.

发布 / Release

推荐发布路径： Recommended release path:

发布 Python 包到 PyPI / Publish the Python package to PyPI
发布 server.json 到官方 MCP Registry / Publish server.json to the official MCP Registry

建议使用 GitHub Actions + PyPI Trusted Publishing（OIDC）+ MCP Registry GitHub OIDC。 The recommended automation is GitHub Actions + PyPI Trusted Publishing (OIDC) + MCP Registry GitHub OIDC.

典型发布流程： Typical release flow:

# 1. 修改版本号（pyproject.toml 和 server.json）
# 2. 提交改动
git commit -am "Release v0.2.0"

# 3. 打 tag
git tag v0.2.0

# 4. 推送分支和 tag
git push origin main --tags

工作流会在 v* tag 上： The release workflow will, on v* tags:

运行测试 / run tests
构建 sdist 和 wheel / build sdist and wheel
做 twine check / run twine check
发布到 PyPI / publish to PyPI
发布到 MCP Registry / publish to the MCP Registry

响应大小与大 PDF 注意事项 / Response size and large-PDF notes

read_pdf_as_images 返回的是 base64 图片，响应体积会迅速变大。
read_pdf_as_images returns base64 image payloads, which can grow very quickly.
图片渲染仍然限制为最多 20 页。
Image rendering is still limited to 20 pages per call.
read_pdf_as_text 现在默认限制为最多 50 页、最多 200000 字符，超限会截断并附带 warning。
read_pdf_as_text now defaults to at most 50 pages and 200000 characters, and truncates with a warning when needed.
read_pdf_as_images 现在默认限制总返回负载约 20MB，超限会提前停止并附带 warning。
read_pdf_as_images now defaults to an overall payload cap of about 20MB and stops early with a warning.
对扫描版 PDF，建议优先按小页范围调用，并降低 dpi、使用 jpeg、降低 quality。
For scanned PDFs, prefer smaller page ranges, lower dpi, jpeg, and lower quality.

开发 / Development

安装开发依赖 / Install dev dependencies:

uv sync --extra dev

运行测试 / Run tests:

uv run pytest

技术栈 / Tech stack

Python 3.10+
MCP Python SDK
PyMuPDF

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

xwell

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

Jun 11, 2026

0.2.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_insight_mcp-0.2.1.tar.gz (68.3 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdf_insight_mcp-0.2.1-py3-none-any.whl (14.7 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file pdf_insight_mcp-0.2.1.tar.gz.

File metadata

Download URL: pdf_insight_mcp-0.2.1.tar.gz
Upload date: Jun 11, 2026
Size: 68.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdf_insight_mcp-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`72c5b9dbb9c38a19349d96e824212c597ea84173ef11ee938c48faec30bbe32b`
MD5	`33a10c8eed2d82ca570edeafd2eea21b`
BLAKE2b-256	`47263f89a236b5ff6db96727f4945795c72813cb3f144f00c7f4361a512882cd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf_insight_mcp-0.2.1.tar.gz:

Publisher: publish.yml on Xvvln/pdf-reader-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pdf_insight_mcp-0.2.1.tar.gz
- Subject digest: 72c5b9dbb9c38a19349d96e824212c597ea84173ef11ee938c48faec30bbe32b
- Sigstore transparency entry: 1790602488
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: Xvvln/pdf-reader-mcp@e478b3fd585f6c3a2d1b487fa564de3566c7944d
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/Xvvln
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e478b3fd585f6c3a2d1b487fa564de3566c7944d
- Trigger Event: push

File details

Details for the file pdf_insight_mcp-0.2.1-py3-none-any.whl.

File metadata

Download URL: pdf_insight_mcp-0.2.1-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdf_insight_mcp-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2931e5baf3bfd1e389cc2e44887c37d80e3a376983f25106b402c04029fa2257`
MD5	`7c72c5ec73b8c7a210c53dc8c95af2b8`
BLAKE2b-256	`54e2f5b1eeb45c3afe2993adbee510e25bd4342be614d9166d031e44eee09806`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf_insight_mcp-0.2.1-py3-none-any.whl:

Publisher: publish.yml on Xvvln/pdf-reader-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pdf_insight_mcp-0.2.1-py3-none-any.whl
- Subject digest: 2931e5baf3bfd1e389cc2e44887c37d80e3a376983f25106b402c04029fa2257
- Sigstore transparency entry: 1790602576
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: Xvvln/pdf-reader-mcp@e478b3fd585f6c3a2d1b487fa564de3566c7944d
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/Xvvln
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e478b3fd585f6c3a2d1b487fa564de3566c7944d
- Trigger Event: push

pdf-insight-mcp 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pdf-reader-mcp

功能特性 / Features

为什么做这个项目 / Why this project

安装 / Installation

前置要求 / Prerequisites

从 PyPI 安装 / Install from PyPI

本地开发安装 / Local development setup

运行服务 / Run the server

在 MCP 客户端中配置 / Configure in an MCP client

发布 / Release

响应大小与大 PDF 注意事项 / Response size and large-PDF notes

开发 / Development

技术栈 / Tech stack

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance