Skip to main content

MCP server for GLM OCR to extract text from images and PDFs

Project description

GLM OCR MCP Server

MCP server for extracting text from images and PDFs using ZhipuAI GLM-OCR.

Usage

Using with Claude Code

Add to ~/.claude/mcp.json:

{
  "mcpServers": {
    "glm-ocr": {
      "command": "uvx",
      "args": ["glm-ocr-mcp"],
      "env": {
        "ZHIPU_API_KEY": "your_api_key_here",
        "ZHIPU_OCR_API_URL": "https://open.bigmodel.cn/api/paas/v4/layout_parsing"
      }
    }
  }
}

Tools

The server provides one tool:

  • extract_text: Extract from local file or URL (png, jpg/jpeg, pdf)
    • default returns Markdown text
    • set return_json=true to return structured JSON without md_results (contains page parsing details like bbox_2d, content, label, etc.)

Parameters:

  • file_path: Local file path or URL for png, jpg/jpeg, or pdf
  • base64_data: Optional data URL/base64 payload (use when file_path is unavailable)
  • start_page_id: Optional PDF start page (1-based, only effective for PDF)
  • end_page_id: Optional PDF end page (1-based, only effective for PDF)
  • return_json: Optional boolean, default false. true returns JSON; false returns Markdown.

Examples

# Extract text from local image
extract_text(file_path="./screenshot.png")

# Extract text from local PDF
extract_text(file_path="./document.pdf")

# Extract text from URL image
extract_text(file_path="https://example.com/test.jpg")

# Use base64/data URL
extract_text(base64_data="data:image/png;base64,iVBORw0KGgo...")

# Extract structured layout JSON
extract_text(file_path="https://example.com/test.png", return_json=True)

Development

# Create virtual environment
uv venv
source .venv/bin/activate

# Sync dependencies and install current project
uv sync

# Run server for testing
python -m glm_ocr_mcp.server

Windows PowerShell activation:

.venv\Scripts\Activate.ps1

Project Structure

glm-ocr-mcp/
├── pyproject.toml         # Project configuration
├── README.md              # Documentation
├── .env.example           # Environment variable template
├── src/
│   └── glm_ocr_mcp/
│       ├── __init__.py
│       ├── __main__.py    # Entry point
│       ├── ocr.py         # OCR client
│       └── server.py      # MCP server

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glm_ocr_mcp-0.1.0.tar.gz (48.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glm_ocr_mcp-0.1.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file glm_ocr_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: glm_ocr_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 48.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for glm_ocr_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2bd0aefac65ea739fa5bc5d415723348679635878ef9ea1dda699590992a5644
MD5 b0e0ade4fd3eb29b42db2d474b3bd62e
BLAKE2b-256 0a7d6c75858bf4ff7dce7c638fed45e398a6738598aac68c24c0ad6b88a68662

See more details on using hashes here.

File details

Details for the file glm_ocr_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: glm_ocr_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for glm_ocr_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 45ce1c86b6736f40830fa36196a71ce3cf75cc0787c5d1fed9c96412254b1020
MD5 fe72503f05a66bb50e47ef73561ea337
BLAKE2b-256 e20c0444d2ccef3401d160a0cc3b1a3d33090ac37dfdbb518622b7db6c5e60d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page