Skip to main content

Unified OCR MCP Server - PaddleOCR / Hunyuan / GLM-4V via Model Context Protocol

Project description

ocr-mcp-server

PyPI Python License: MIT

统一 OCR MCP Server,通过 MCP 协议 对外暴露多家 OCR 引擎。 支持 Windows 本地运行,可发布到 PyPI,可供 Claude Desktop / Cursor / Trae / 自定义 Java-Python 项目调用。

支持的 OCR Provider

Provider 类型 特点 配置方式
paddleocr 本地运行 无需网络,中英文/多语言,数据不出本机 pip install "ocr-mcp-server[paddleocr]"
hunyuan 腾讯云 API 高精度,手写/表格/发票,按量计费 设置 HUNYUAN_SECRET_ID + HUNYUAN_SECRET_KEY
glm 智谱 AI API GLM-4V 大模型,理解复杂版式,Flash 有免费额度 设置 GLM_API_KEY

安装

# 基础安装(仅云 API,不含 PaddleOCR)
pip install ocr-mcp-server

# 含 PaddleOCR 本地识别(Windows 推荐)
pip install "ocr-mcp-server[paddleocr]"

Windows 安装 PaddleOCR 注意:需要先安装 Visual C++ Redistributable, 并建议使用 Python 3.9-3.11。

快速使用

1. 配置环境变量

创建 .env 文件或直接在系统环境变量中设置:

# PaddleOCR(安装后自动启用,无需配置)

# 腾讯混元 OCR(可选)
HUNYUAN_SECRET_ID=AKIDxxxxxxxxxxxxxxxx
HUNYUAN_SECRET_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# 智谱 GLM-OCR(可选)
GLM_API_KEY=xxxxxxxx.xxxxxxxxxxxxxxxx

# 指定默认引擎(可选,不设置则优先使用 paddleocr)
OCR_DEFAULT_PROVIDER=paddleocr

2. 运行服务

# stdio 模式(供 Claude Desktop / Cursor / Trae 调用)
ocr-mcp-server

# HTTP 模式(供远程服务器上的 AI 应用调用)
ocr-mcp-server --transport streamable-http --port 8000

# 指定默认 Provider
ocr-mcp-server --default-provider glm

MCP 客户端配置

Claude Desktop

编辑 %APPDATA%\Claude\claude_desktop_config.json(Windows):

{
  "mcpServers": {
    "ocr": {
      "command": "ocr-mcp-server",
      "env": {
        "GLM_API_KEY": "your_glm_api_key",
        "HUNYUAN_SECRET_ID": "your_secret_id",
        "HUNYUAN_SECRET_KEY": "your_secret_key"
      }
    }
  }
}

Cursor / Trae IDE

编辑 .cursor/mcp.json 或 Trae 的 MCP 配置:

{
  "mcpServers": {
    "ocr": {
      "command": "ocr-mcp-server",
      "args": ["--default-provider", "paddleocr"]
    }
  }
}

远程 HTTP 部署(供 Java 项目调用)

# 服务器端启动
ocr-mcp-server --transport streamable-http --host 0.0.0.0 --port 8000

Java 项目中通过 Spring AI MCP Client 调用:

// pom.xml 引入 spring-ai-mcp-client-spring-boot-starter
McpClient client = McpClient.sync(
    new HttpClientSseClientTransport("http://your-server:8000/sse")
).build();
client.initialize();
CallToolResult result = client.callTool(
    new CallToolRequest("ocr_recognize_from_url",
        Map.of("image_url", "https://example.com/invoice.jpg"))
);

Python 项目调用:

from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async with streamablehttp_client("http://your-server:8000/mcp") as (read, write, _):
    async with ClientSession(read, write) as session:
        await session.initialize()
        result = await session.call_tool(
            "ocr_recognize_from_url",
            {"image_url": "https://example.com/invoice.jpg"}
        )

暴露的 MCP Tools

Tool 说明 必填参数
ocr_recognize_from_url 通过图片 URL 识别文字 image_url
ocr_recognize_from_base64 通过 Base64 图片识别文字 base64_image
ocr_list_providers 列出所有 OCR 引擎及状态

所有工具支持可选参数:

  • provider: 指定 OCR 引擎(不填自动选择)
  • response_format: markdown(默认)或 json

开发与发布

# 克隆项目
git clone https://github.com/yourname/ocr-mcp-server
cd ocr-mcp-server

# 安装开发依赖
pip install -e ".[dev]"

# 运行测试
pytest tests/ -v

# 本地调试(MCP Inspector)
npx @modelcontextprotocol/inspector ocr-mcp-server

# 构建
python -m build

# 发布到 PyPI
twine upload dist/*

# 发布到 TestPyPI(先测试)
twine upload --repository testpypi dist/*

项目结构

ocr-mcp-server/
├── pyproject.toml                  # 包配置、依赖、CLI 入口
├── README.md
├── LICENSE
├── scripts/
│   └── evaluation.xml              # MCP 标准测试集
├── src/ocr_mcp_server/
│   ├── __init__.py
│   ├── __main__.py                 # CLI 入口: ocr-mcp-server
│   ├── config.py                   # 环境变量配置
│   ├── server.py                   # FastMCP Server + 3 个 Tools
│   ├── providers/
│   │   ├── __init__.py             # BaseOcrProvider 基类 + OcrResult 模型
│   │   ├── paddleocr_provider.py   # PaddleOCR(本地)
│   │   ├── hunyuan_provider.py     # 腾讯混元OCR
│   │   └── glm_provider.py         # 智谱 GLM-4V OCR
│   └── utils/
│       └── image.py                # URL 下载 + Base64 解码工具
└── tests/
    ├── test_providers.py
    └── test_image_utils.py

许可证

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_mcp_server-1.0.0.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocr_mcp_server-1.0.0-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file ocr_mcp_server-1.0.0.tar.gz.

File metadata

  • Download URL: ocr_mcp_server-1.0.0.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ocr_mcp_server-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8f661cdd325f4f4c414a674ea94db916d968b1d9af90c4ee1eecc7dc09aea42b
MD5 2d85d34861d9640a20dea38362f03d90
BLAKE2b-256 07ba29f450f5673407b3c9ffef0808b1570c6f7a0f702eb4701959561c86d481

See more details on using hashes here.

File details

Details for the file ocr_mcp_server-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ocr_mcp_server-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ocr_mcp_server-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 addf86d1f5096ccb52e690c2e2e6d221833e36a8ab382ef32eef27c72534c294
MD5 5caeb0f86b7419a630b33aeb14042ad6
BLAKE2b-256 9a685b56f9698c47232c196f4baec1633d1b4c8e0dfd570b5b290c2fbe05dfda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page