Skip to main content

Thulium - State-of-the-Art Multilingual Handwriting Text Recognition for Python

Project description

Thulium

State-of-the-art multilingual handwriting text recognition.

PyPI Python License Documentation

Thulium is a production-ready Python library for offline handwritten text recognition (HTR) supporting 52+ languages across Latin, Cyrillic, Greek, Arabic, Hebrew, Devanagari, Chinese, Japanese, Korean, and Georgian scripts.

Features

  • 52+ Languages — Comprehensive multilingual support with script-aware processing
  • Production Ready — Optimized inference with ONNX export and mixed precision
  • State-of-the-Art — CNN/ViT backbones with Transformer/LSTM sequence heads
  • Explainable AI — Attention visualization, saliency maps, and confidence analysis
  • Flexible Decoding — CTC beam search with n-gram and neural language models

Installation

pip install thulium

For GPU acceleration:

pip install thulium[gpu]

Quick Start

from thulium import recognize_image

# Single image recognition
result = recognize_image("document.png", language="en")
print(result.text)

# Batch recognition with confidence scores
from thulium import HTRPipeline

pipeline = HTRPipeline.from_pretrained("thulium-base-multilingual")
results = pipeline.recognize_batch(images, languages=["en", "de", "fr"])

for r in results:
    print(f"{r.text} (confidence: {r.confidence:.2%})")

Supported Languages

52+ languages across 10 scripts (click to expand)
Region Languages
Western Europe English, German, French, Spanish, Italian, Portuguese, Dutch
Scandinavia Swedish, Norwegian, Danish, Finnish, Icelandic
Eastern Europe Polish, Czech, Hungarian, Romanian, Bulgarian, Ukrainian, Russian
Baltic Lithuanian, Latvian, Estonian
Caucasus Georgian, Armenian, Azerbaijani
Middle East Arabic, Hebrew, Persian, Turkish
South Asia Hindi, Bengali, Tamil, Telugu, Urdu
East Asia Chinese, Japanese, Korean

Documentation

Guide Description
Getting Started Installation and first steps
API Reference Complete API documentation
Model Zoo Pretrained model catalog
Training Guide Train custom models
Architecture System design overview

Performance

Benchmarks on IAM Handwriting Database:

Model CER WER Latency
thulium-tiny 5.2% 14.1% 12ms
thulium-base 3.8% 10.2% 28ms
thulium-large 2.9% 7.8% 65ms

Measured on NVIDIA A100, batch size 1, PyTorch 2.0+

Citation

@software{thulium2025,
  title={Thulium: Multilingual Handwriting Recognition},
  author={Thulium Authors},
  year={2025},
  url={https://github.com/thulium-dev/thulium}
}

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

Apache 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thulium_htr-1.2.1.tar.gz (180.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thulium_htr-1.2.1-py3-none-any.whl (175.3 kB view details)

Uploaded Python 3

File details

Details for the file thulium_htr-1.2.1.tar.gz.

File metadata

  • Download URL: thulium_htr-1.2.1.tar.gz
  • Upload date:
  • Size: 180.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.1

File hashes

Hashes for thulium_htr-1.2.1.tar.gz
Algorithm Hash digest
SHA256 7be1c631265861726b5223d3d34829ba939ce84c16c7bc619c5be15b42790eee
MD5 c541d3535eca3bfadc34eb99ac245b38
BLAKE2b-256 bea203338822954f6c60e391a2d1e38fdac7661fb7af3e156325c82b27428dc2

See more details on using hashes here.

File details

Details for the file thulium_htr-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: thulium_htr-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 175.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.1

File hashes

Hashes for thulium_htr-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2326b9b317d0838d5535057f688f1a0e3403e74c24ad845dac241c710451079e
MD5 4dbf73f69b08dfa6c20ad4a0225dd72e
BLAKE2b-256 a05fb28ee3a1394887cb7d7564991b81be6bc177d77ef5f7bbf0080678b5e37d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page