Skip to main content

High-performance German document OCR - Local & Cloud with GPU/CPU support

Project description

German-OCR Logo

High-performance German document OCR - Local & Cloud

PyPI version npm version Packagist License Cloud API


Features

Feature Local Cloud
German Documents Invoices, contracts, forms All document types
Output Formats Markdown, JSON, text JSON, Markdown, text, n8n
PDF Support Images only Up to 50 pages
Privacy 100% local DSGVO-konform (Frankfurt)
Speed ~5s/page ~2-3s/page
Backends Ollama, llama.cpp, HuggingFace Cloud API
Hardware CPU, GPU, NPU (CUDA/Metal/Vulkan/OpenVINO) Managed

Installation

Python

pip install german-ocr

Node.js

npm install german-ocr

PHP

composer require keyvan/german-ocr

Quick Start

Option 1: Cloud API (Recommended)

No GPU required. Get your API credentials at app.german-ocr.de

from german_ocr import CloudClient

# API Key + Secret (Secret is only shown once at creation!)
client = CloudClient(
    api_key="gocr_xxxxxxxx",
    api_secret="your_64_char_secret_here"
)

# Simple extraction
result = client.analyze("invoice.pdf")
print(result.text)

# Structured JSON output
result = client.analyze(
    "invoice.pdf",
    prompt="Extrahiere Rechnungsnummer und Gesamtbetrag",
    output_format="json"
)
print(result.text)

Node.js

const { GermanOCR } = require('german-ocr');

const client = new GermanOCR(
    process.env.GERMAN_OCR_API_KEY,
    process.env.GERMAN_OCR_API_SECRET
);

const result = await client.analyze('invoice.pdf', {
    model: 'german-ocr-ultra'
});
console.log(result.text);

PHP

<?php
use GermanOCR\GermanOCR;

$client = new GermanOCR(
    getenv('GERMAN_OCR_API_KEY'),
    getenv('GERMAN_OCR_API_SECRET')
);

$result = $client->analyze('invoice.pdf', [
    'model' => GermanOCR::MODEL_ULTRA
]);
echo $result['text'];

Option 2: Local (Ollama)

Requires Ollama installed.

# Install model
ollama pull Keyvan/german-ocr-turbo
from german_ocr import GermanOCR

ocr = GermanOCR()
text = ocr.extract("invoice.png")
print(text)

Option 3: Local (llama.cpp)

For maximum control and edge deployment with GGUF models.

# Install with GPU support (CUDA)
CMAKE_ARGS="-DGGML_CUDA=on" pip install german-ocr[llamacpp]

# Or CPU only
pip install german-ocr[llamacpp]
from german_ocr import GermanOCR

# Auto-detect best device (GPU/CPU)
ocr = GermanOCR(backend="llamacpp")
text = ocr.extract("invoice.png")

# Force CPU only
ocr = GermanOCR(backend="llamacpp", n_gpu_layers=0)

# Full GPU acceleration
ocr = GermanOCR(backend="llamacpp", n_gpu_layers=-1)

Cloud Models

Model Parameter Best For
German-OCR Ultra german-ocr-ultra Maximale Präzision, Strukturerkennung
German-OCR Pro german-ocr-pro Balance aus Speed & Qualität
German-OCR Turbo german-ocr DSGVO-konform, lokale Verarbeitung in DE

Model Selection

from german_ocr import CloudClient

client = CloudClient(
    api_key="gocr_xxxxxxxx",
    api_secret="your_64_char_secret_here"
)

# German-OCR Ultra - Maximale Präzision
result = client.analyze("dokument.pdf", model="german-ocr-ultra")

# German-OCR Pro - Schnelle Cloud (Standard)
result = client.analyze("dokument.pdf", model="german-ocr-pro")

# German-OCR Turbo - Lokal, DSGVO-konform
result = client.analyze("dokument.pdf", model="german-ocr")

CLI Usage

Cloud

# Set API credentials (Secret shown only once at creation!)
export GERMAN_OCR_API_KEY="gocr_xxxxxxxx"
export GERMAN_OCR_API_SECRET="your_64_char_secret_here"

# Extract text (uses German-OCR Pro by default)
german-ocr --cloud invoice.pdf

# Use German-OCR Turbo (DSGVO-konform, lokal)
german-ocr --cloud --model german-ocr invoice.pdf

# JSON output with German-OCR Ultra
german-ocr --cloud --model german-ocr-ultra --output-format json invoice.pdf

# With custom prompt
german-ocr --cloud --prompt "Extrahiere alle Betraege" invoice.pdf

Local

# Single image
german-ocr invoice.png

# Batch processing
german-ocr --batch ./invoices/

# JSON output
german-ocr --format json invoice.png

Cloud API

Output Formats

Format Description
text Plain text (default)
json Structured JSON
markdown Formatted Markdown
n8n n8n-compatible format

Progress Tracking

from german_ocr import CloudClient

client = CloudClient(
    api_key="gocr_xxxxxxxx",
    api_secret="your_64_char_secret"
)

def on_progress(status):
    print(f"Page {status.current_page}/{status.total_pages}")

result = client.analyze(
    "large_document.pdf",
    on_progress=on_progress
)

Async Processing

# Submit job with German-OCR Pro
job = client.submit("document.pdf", model="german-ocr-pro", output_format="json")
print(f"Job ID: {job.job_id}")

# Check status
status = client.get_job(job.job_id)
print(f"Status: {status.status}")

# Wait for result
result = client.wait_for_result(job.job_id)

# Cancel job
client.cancel_job(job.job_id)

Account Info

# Check balance
balance = client.get_balance()
print(f"Balance: {balance}")

# Usage statistics
usage = client.get_usage()
print(f"Usage: {usage}")

Local Models

Ollama Models

Model Size Speed Best For
german-ocr-turbo 1.9 GB ~5s Recommended
german-ocr 3.2 GB ~7s Standard

GGUF Models (llama.cpp)

Model Size Speed Best For
german-ocr-2b 1.5 GB ~5s (GPU) / ~25s (CPU) Edge/Embedded
german-ocr-turbo 1.9 GB ~5s (GPU) / ~20s (CPU) Best accuracy

Hardware Support:

  • CUDA (NVIDIA GPUs)
  • Metal (Apple Silicon)
  • Vulkan (AMD/Intel/NVIDIA)
  • OpenVINO (Intel NPU)
  • CPU (all platforms)

Pricing

See current pricing at app.german-ocr.de

License

Apache 2.0 - See LICENSE for details.

Author

Keyvan Hardani - keyvan.ai


Made with love in Germany

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

german_ocr-0.6.0.tar.gz (31.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

german_ocr-0.6.0-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file german_ocr-0.6.0.tar.gz.

File metadata

  • Download URL: german_ocr-0.6.0.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for german_ocr-0.6.0.tar.gz
Algorithm Hash digest
SHA256 d5790f8723ab60b7e9c0d354d2da4ded4db48d50ede49508194248e839aac131
MD5 17918ba6ee2415819c239936c6e5ee7d
BLAKE2b-256 78230a7b8ff6d1b60ad0b3a805f306f104beea7043af997b9dfd9524c4e7512a

See more details on using hashes here.

Provenance

The following attestation bundles were made for german_ocr-0.6.0.tar.gz:

Publisher: publish.yml on Keyvanhardani/german-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file german_ocr-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: german_ocr-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for german_ocr-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ff68bf00f93a382eb87aa7d19456a9f1ae0de001e600a0ca2b04eae838dcc03
MD5 94b4a3d9573597b4521965bc934874e8
BLAKE2b-256 2a1e8b52fbe7fda1d2b34f95d9c9ef90e4532b2c50c42154272aef6095ac5940

See more details on using hashes here.

Provenance

The following attestation bundles were made for german_ocr-0.6.0-py3-none-any.whl:

Publisher: publish.yml on Keyvanhardani/german-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page