Thulium - State-of-the-Art Multilingual Handwriting Text Recognition for Python
Project description
Thulium
State-of-the-art multilingual handwriting text recognition.
Thulium is a production-ready Python library for offline handwritten text recognition (HTR) supporting 52+ languages across Latin, Cyrillic, Greek, Arabic, Hebrew, Devanagari, Chinese, Japanese, Korean, and Georgian scripts.
Features
- 52+ Languages — Comprehensive multilingual support with script-aware processing
- Production Ready — Optimized inference with ONNX export and mixed precision
- State-of-the-Art — CNN/ViT backbones with Transformer/LSTM sequence heads
- Explainable AI — Attention visualization, saliency maps, and confidence analysis
- Flexible Decoding — CTC beam search with n-gram and neural language models
Installation
pip install thulium
For GPU acceleration:
pip install thulium[gpu]
Quick Start
from thulium import recognize_image
# Single image recognition
result = recognize_image("document.png", language="en")
print(result.text)
# Batch recognition with confidence scores
from thulium import HTRPipeline
pipeline = HTRPipeline.from_pretrained("thulium-base-multilingual")
results = pipeline.recognize_batch(images, languages=["en", "de", "fr"])
for r in results:
print(f"{r.text} (confidence: {r.confidence:.2%})")
Supported Languages
52+ languages across 10 scripts (click to expand)
| Region | Languages |
|---|---|
| Western Europe | English, German, French, Spanish, Italian, Portuguese, Dutch |
| Scandinavia | Swedish, Norwegian, Danish, Finnish, Icelandic |
| Eastern Europe | Polish, Czech, Hungarian, Romanian, Bulgarian, Ukrainian, Russian |
| Baltic | Lithuanian, Latvian, Estonian |
| Caucasus | Georgian, Armenian, Azerbaijani |
| Middle East | Arabic, Hebrew, Persian, Turkish |
| South Asia | Hindi, Bengali, Tamil, Telugu, Urdu |
| East Asia | Chinese, Japanese, Korean |
Documentation
| Guide | Description |
|---|---|
| Getting Started | Installation and first steps |
| API Reference | Complete API documentation |
| Model Zoo | Pretrained model catalog |
| Training Guide | Train custom models |
| Architecture | System design overview |
Performance
Benchmarks on IAM Handwriting Database:
| Model | CER | WER | Latency |
|---|---|---|---|
| thulium-tiny | 5.2% | 14.1% | 12ms |
| thulium-base | 3.8% | 10.2% | 28ms |
| thulium-large | 2.9% | 7.8% | 65ms |
Measured on NVIDIA A100, batch size 1, PyTorch 2.0+
Citation
@software{thulium2025,
title={Thulium: Multilingual Handwriting Recognition},
author={Thulium Authors},
year={2025},
url={https://github.com/thulium-dev/thulium}
}
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
License
Apache 2.0 — see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thulium_htr-1.2.1.tar.gz.
File metadata
- Download URL: thulium_htr-1.2.1.tar.gz
- Upload date:
- Size: 180.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7be1c631265861726b5223d3d34829ba939ce84c16c7bc619c5be15b42790eee
|
|
| MD5 |
c541d3535eca3bfadc34eb99ac245b38
|
|
| BLAKE2b-256 |
bea203338822954f6c60e391a2d1e38fdac7661fb7af3e156325c82b27428dc2
|
File details
Details for the file thulium_htr-1.2.1-py3-none-any.whl.
File metadata
- Download URL: thulium_htr-1.2.1-py3-none-any.whl
- Upload date:
- Size: 175.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2326b9b317d0838d5535057f688f1a0e3403e74c24ad845dac241c710451079e
|
|
| MD5 |
4dbf73f69b08dfa6c20ad4a0225dd72e
|
|
| BLAKE2b-256 |
a05fb28ee3a1394887cb7d7564991b81be6bc177d77ef5f7bbf0080678b5e37d
|