Unified OCR MCP Server - PaddleOCR / Hunyuan / GLM-4V via Model Context Protocol
Project description
ocr-mcp-server
统一 OCR MCP Server,通过 MCP 协议 对外暴露多家 OCR 引擎。 支持 Windows 本地运行,可发布到 PyPI,可供 Claude Desktop / Cursor / Trae / 自定义 Java-Python 项目调用。
支持的 OCR Provider
| Provider | 类型 | 特点 | 配置方式 |
|---|---|---|---|
paddleocr |
本地运行 | 无需网络,中英文/多语言,数据不出本机 | pip install "ocr-mcp-server[paddleocr]" |
hunyuan |
腾讯云 API | 高精度,手写/表格/发票,按量计费 | 设置 HUNYUAN_SECRET_ID + HUNYUAN_SECRET_KEY |
glm |
智谱 AI API | GLM-4V 大模型,理解复杂版式,Flash 有免费额度 | 设置 GLM_API_KEY |
安装
# 基础安装(仅云 API,不含 PaddleOCR)
pip install ocr-mcp-server
# 含 PaddleOCR 本地识别(Windows 推荐)
pip install "ocr-mcp-server[paddleocr]"
Windows 安装 PaddleOCR 注意:需要先安装 Visual C++ Redistributable, 并建议使用 Python 3.9-3.11。
快速使用
1. 配置环境变量
创建 .env 文件或直接在系统环境变量中设置:
# PaddleOCR(安装后自动启用,无需配置)
# 腾讯混元 OCR(可选)
HUNYUAN_SECRET_ID=AKIDxxxxxxxxxxxxxxxx
HUNYUAN_SECRET_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# 智谱 GLM-OCR(可选)
GLM_API_KEY=xxxxxxxx.xxxxxxxxxxxxxxxx
# 指定默认引擎(可选,不设置则优先使用 paddleocr)
OCR_DEFAULT_PROVIDER=paddleocr
2. 运行服务
# stdio 模式(供 Claude Desktop / Cursor / Trae 调用)
ocr-mcp-server
# HTTP 模式(供远程服务器上的 AI 应用调用)
ocr-mcp-server --transport streamable-http --port 8000
# 指定默认 Provider
ocr-mcp-server --default-provider glm
MCP 客户端配置
Claude Desktop
编辑 %APPDATA%\Claude\claude_desktop_config.json(Windows):
{
"mcpServers": {
"ocr": {
"command": "ocr-mcp-server",
"env": {
"GLM_API_KEY": "your_glm_api_key",
"HUNYUAN_SECRET_ID": "your_secret_id",
"HUNYUAN_SECRET_KEY": "your_secret_key"
}
}
}
}
Cursor / Trae IDE
编辑 .cursor/mcp.json 或 Trae 的 MCP 配置:
{
"mcpServers": {
"ocr": {
"command": "ocr-mcp-server",
"args": ["--default-provider", "paddleocr"]
}
}
}
远程 HTTP 部署(供 Java 项目调用)
# 服务器端启动
ocr-mcp-server --transport streamable-http --host 0.0.0.0 --port 8000
Java 项目中通过 Spring AI MCP Client 调用:
// pom.xml 引入 spring-ai-mcp-client-spring-boot-starter
McpClient client = McpClient.sync(
new HttpClientSseClientTransport("http://your-server:8000/sse")
).build();
client.initialize();
CallToolResult result = client.callTool(
new CallToolRequest("ocr_recognize_from_url",
Map.of("image_url", "https://example.com/invoice.jpg"))
);
Python 项目调用:
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
async with streamablehttp_client("http://your-server:8000/mcp") as (read, write, _):
async with ClientSession(read, write) as session:
await session.initialize()
result = await session.call_tool(
"ocr_recognize_from_url",
{"image_url": "https://example.com/invoice.jpg"}
)
暴露的 MCP Tools
| Tool | 说明 | 必填参数 |
|---|---|---|
ocr_recognize_from_url |
通过图片 URL 识别文字 | image_url |
ocr_recognize_from_base64 |
通过 Base64 图片识别文字 | base64_image |
ocr_list_providers |
列出所有 OCR 引擎及状态 | 无 |
所有工具支持可选参数:
provider: 指定 OCR 引擎(不填自动选择)response_format:markdown(默认)或json
开发与发布
# 克隆项目
git clone https://github.com/yourname/ocr-mcp-server
cd ocr-mcp-server
# 安装开发依赖
pip install -e ".[dev]"
# 运行测试
pytest tests/ -v
# 本地调试(MCP Inspector)
npx @modelcontextprotocol/inspector ocr-mcp-server
# 构建
python -m build
# 发布到 PyPI
twine upload dist/*
# 发布到 TestPyPI(先测试)
twine upload --repository testpypi dist/*
项目结构
ocr-mcp-server/
├── pyproject.toml # 包配置、依赖、CLI 入口
├── README.md
├── LICENSE
├── scripts/
│ └── evaluation.xml # MCP 标准测试集
├── src/ocr_mcp_server/
│ ├── __init__.py
│ ├── __main__.py # CLI 入口: ocr-mcp-server
│ ├── config.py # 环境变量配置
│ ├── server.py # FastMCP Server + 3 个 Tools
│ ├── providers/
│ │ ├── __init__.py # BaseOcrProvider 基类 + OcrResult 模型
│ │ ├── paddleocr_provider.py # PaddleOCR(本地)
│ │ ├── hunyuan_provider.py # 腾讯混元OCR
│ │ └── glm_provider.py # 智谱 GLM-4V OCR
│ └── utils/
│ └── image.py # URL 下载 + Base64 解码工具
└── tests/
├── test_providers.py
└── test_image_utils.py
许可证
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ocr_mcp_server-1.0.0.tar.gz.
File metadata
- Download URL: ocr_mcp_server-1.0.0.tar.gz
- Upload date:
- Size: 29.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f661cdd325f4f4c414a674ea94db916d968b1d9af90c4ee1eecc7dc09aea42b
|
|
| MD5 |
2d85d34861d9640a20dea38362f03d90
|
|
| BLAKE2b-256 |
07ba29f450f5673407b3c9ffef0808b1570c6f7a0f702eb4701959561c86d481
|
File details
Details for the file ocr_mcp_server-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ocr_mcp_server-1.0.0-py3-none-any.whl
- Upload date:
- Size: 31.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
addf86d1f5096ccb52e690c2e2e6d221833e36a8ab382ef32eef27c72534c294
|
|
| MD5 |
5caeb0f86b7419a630b33aeb14042ad6
|
|
| BLAKE2b-256 |
9a685b56f9698c47232c196f4baec1633d1b4c8e0dfd570b5b290c2fbe05dfda
|