Skip to main content

A Model Context Protocol server providing OCR capabilities for images and videos using GLM-4.1V-Thinking-Flash

Project description

MCP OCR Server

A Model Context Protocol (MCP) server that provides OCR (Optical Character Recognition) capabilities for images and videos using GLM-4.1V-Thinking-Flash.

Features

  • Image OCR: Extract text from images using GLM-4.1V-Thinking-Flash
  • Video OCR: Extract text from video frames (planned feature)
  • Custom Prompts: Support for custom OCR prompts
  • MCP Integration: Full MCP protocol support for seamless integration

Requirements

  • Python 3.10+
  • GLM-4.1V-Thinking-Flash API key from ZhipuAI

Quick Start with uvx

The fastest way to get started is using uvx:

# Install and run directly
uvx mcp-server-ocr

# Or install for development
uvx --python 3.10 --with-editable . mcp-server-ocr

Installation

Using uv (recommended)

# Clone the repository
git clone <repository-url>
cd ocr

# Install dependencies
uv sync

# Run the server
uv run mcp-server-ocr

Using pip

# Clone the repository
git clone <repository-url>
cd ocr

# Install dependencies
pip install -e .

# Run the server
mcp-server-ocr

Configuration

Set your GLM-4.1V-Thinking-Flash API key as an environment variable:

export ZHIPU_API_KEY="your-api-key-here"

Usage

Available Tools

  1. ocr_image: Perform OCR on image files

    • image_path: Path to the image file
    • prompt: Custom prompt for OCR processing (optional)
  2. ocr_video: Perform OCR on video frames (coming soon)

    • video_path: Path to the video file
    • prompt: Custom prompt for video OCR processing (optional)
    • frame_interval: Extract frames every N seconds (optional)

Available Prompts

  1. ocr_image: Extract text from an image
  2. ocr_video: Extract text from video frames

Build and Publish

Build the package

# Using uv
uv build

# Using pip
python -m build

Publish to PyPI

# Using uv
uv publish

# Using twine
twine upload dist/*

Development

Setup development environment

# Clone and setup
git checkout -b feature/your-feature
uv sync --dev

# Run linting and type checking
uv run ruff check .
uv run pyright .

Testing

# Run tests (when available)
uv run pytest

# Test the server manually
uv run python -m mcp_server_ocr

API Documentation

MCP Protocol

This server implements the Model Context Protocol (MCP) specification:

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_server_ocr-0.1.0.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_server_ocr-0.1.0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file mcp_server_ocr-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_server_ocr-0.1.0.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.11

File hashes

Hashes for mcp_server_ocr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22fc18d4dbbf4a627c27a709a5c2c77ba34dd5a9b02671e773b92c9ca71b4f34
MD5 d7b7c71c2c14925bc067e69fa7ffc91c
BLAKE2b-256 976cee8e02ed64f620fcfeaae2e63818aa76ab48aab7e8f7f5099087096d2ef3

See more details on using hashes here.

File details

Details for the file mcp_server_ocr-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_server_ocr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f75416dfdca3ae7fb9d1bef017d3e9eb91c9e979cf356f32237f7450ce5e6be
MD5 3ab17b8f06036cd629692805e6b89b98
BLAKE2b-256 c6fb0e0660fc469254cc6f8a28ca1ae691339711167a0241a74e02479f7c2f71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page