An MCP server for the 'docmind_parser' library.
Project description
DocMind Parser MCP
DocMind Parser MCP 是一个 Python 包和命令行实用程序,用于将各种文件转换为 Markdown 格式,适用于索引、文本分析等场景。
支持的文档格式
- Word 文档 (doc, docx)
- PowerPoint 演示文稿 (ppt, pptx)
- Excel 电子表格 (xls, xlsx, xlsm)
- 图片 (jpg, jpeg, png, bmp, gif)
- 其他格式 (markdown, html, epub, mobi, rtf, txt)
支持本地文件和 URL 文件两种方式。
安装
通过 uvx 直接运行(推荐)
uvx docmind-parser-mcp
从源码安装
cd docmind-parse-mcp
pip install .
快速开始
环境变量配置
该工具依赖阿里云文档智能解析服务,需要配置以下环境变量:
export ALIBABA_CLOUD_ACCESS_KEY_ID=YOUR_ALIBABA_CLOUD_ACCESS_KEY_ID
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=YOUR_ALIBABA_CLOUD_ACCESS_KEY_SECRET
export ALIBABA_CLOUD_SECURITY_TOKEN=YOUR_ALIBABA_CLOUD_SECURITY_TOKEN # 可选,用于临时凭证
此外,还可以通过以下环境变量来配置解析选项:
# 解析配置
export DOC_MIND_STRUCTURE_TYPE=layout # 结构化类型: layout 或 doctree
export DOC_MIND_FORMULA_ENHANCEMENT=true # 公式增强: true 或 false
export DOC_MIND_LLM_ENHANCEMENT=false # 大模型增强: true 或 false
export DOC_MIND_ENHANCEMENT_MODE=VLM # 增强模式: VLM 或其他支持的模式 配合DOC_MIND_LLM_ENHANCEMENT开启使用
# 服务器配置
export SERVER_PROTOCOL_MODE=stdio # 服务器协议模式: stdio 或 sse
export SERVER_BIND_HOST=127.0.0.1 # 绑定主机地址
export SERVER_LISTEN_PORT=3001 # 监听端口
启动模式
支持两种启动模式,通过 SERVER_PROTOCOL_MODE 环境变量配置:
-
stdio 模式(默认):
export SERVER_PROTOCOL_MODE=stdio uvx docmind-parser-mcp
-
SSE 模式:
export SERVER_PROTOCOL_MODE=sse uvx docmind-parser-mcp
SSE 模式下,服务将在
http://127.0.0.1:3001/sse启动。
使用示例
在 MCP 客户端中配置
stdio 模式配置
{
"mcpServers": {
"docmind-parser-mcp": {
"name": "docmind-parser-mcp",
"command": "uvx",
"args": [
"docmind-parser-mcp"
],
"env": {
"SERVER_PROTOCOL_MODE": "stdio",
"ALIBABA_CLOUD_ACCESS_KEY_ID": "YOUR_ALIBABA_CLOUD_ACCESS_KEY_ID",
"ALIBABA_CLOUD_ACCESS_KEY_SECRET": "YOUR_ALIBABA_CLOUD_ACCESS_KEY_SECRET",
"ALIBABA_CLOUD_SECURITY_TOKEN": "YOUR_ALIBABA_CLOUD_SECURITY_TOKEN"
}
}
}
}
SSE 模式配置
{
"mcpServers": {
"docmind-parser-mcp": {
"url": "http://127.0.0.1:3001/sse",
"transportType": "sse"
}
}
}
Python 客户端示例
import asyncio
from mcp.client.stdio import stdio_client
from mcp import ClientSession, StdioServerParameters
# 配置服务器参数
server_params = StdioServerParameters(
command='uvx',
args=['docmind-parser-mcp'],
env={
"SERVER_PROTOCOL_MODE": "stdio",
"ALIBABA_CLOUD_ACCESS_KEY_ID": "YOUR_ALIBABA_CLOUD_ACCESS_KEY_ID",
"ALIBABA_CLOUD_ACCESS_KEY_SECRET": "YOUR_ALIBABA_CLOUD_ACCESS_KEY_SECRET",
"ALIBABA_CLOUD_SECURITY_TOKEN": "YOUR_ALIBABA_CLOUD_SECURITY_TOKEN",
}
)
async def main():
# 创建 stdio 客户端
async with stdio_client(server_params) as (stdio, write):
# 创建 ClientSession 对象
async with ClientSession(stdio, write) as session:
# 初始化会话
await session.initialize()
# 列出可用工具
response = await session.list_tools()
print("Available tools:", response)
# 调用转换工具
response = await session.call_tool(
'convert_to_markdown',
{'uri': 'https://example.com/document.pdf'}
)
print("Conversion result:", response)
if __name__ == '__main__':
asyncio.run(main())
开发
运行测试
项目包含针对不同平台的URI处理测试用例:
# 安装测试依赖
pip install pytest
# 运行测试
python -m pytest tests/ -v
测试用例覆盖了以下场景:
- POSIX系统上的普通文件路径(如
file:///home/user/document.txt) - Windows系统上的驱动器路径(如
file:///C:/Users/document.txt) - Windows系统上的UNC网络路径(如
file://server/share/document.txt)
构建包
cd docmind-parse-mcp
uv build
构建完成后,可以在 dist/ 目录找到生成的包文件。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docmind_parser_mcp-0.1.5-py3-none-any.whl.
File metadata
- Download URL: docmind_parser_mcp-0.1.5-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a16b29889299e8f1b734d7178b18dfaa20a8df54bca745f9e8d9a7d4d1cf22d0
|
|
| MD5 |
b710fe30cf90973c50a148ec29944d1c
|
|
| BLAKE2b-256 |
128d66880f386bdc338ea651ff45e6a431e0eb3b752d7db8aab4b28f225768ae
|