Best Bangla/Bengali OCR - Transformer-based text recognition for handwritten & printed Bangla. Works on Colab, Kaggle, local. Zero dependency conflicts.
Project description
Bangla OCR 🇧🇩
The Best Bangla/Bengali OCR Package - Recognize Bangla text from images with zero dependency conflicts!
✨ Why Bangla OCR?
- 🔤 Accurate: State-of-the-art Transformer architecture for Bangla text recognition
- 🚀 Zero Conflicts: Works with ANY version of PyTorch, NumPy - no dependency hell!
- ☁️ Cloud Ready: Works instantly on Google Colab, Kaggle, AWS, GCP, Azure
- 💻 Cross-Platform: Windows, Linux, macOS, Apple Silicon (M1/M2/M3/M4)
- 🎯 Smart Fallbacks: Works even without OpenCV (uses PIL as fallback)
- 📥 Multiple Inputs: Accepts file paths, PIL Images, numpy arrays, URLs, and bytes
- ⚡ Fast: GPU acceleration with CUDA and Apple MPS support
📦 Installation
pip install bangla-ocr
That's it! Works everywhere - Colab, Kaggle, local, Docker, cloud!
Optional: With OpenCV (slightly better preprocessing)
pip install bangla-ocr[cv-headless] # Recommended for servers/Colab/Kaggle
pip install bangla-ocr[cv] # For local development with GUI
pip install bangla-ocr[full] # Everything included
🚀 Quick Start
from bangla_ocr import BanglaOCR
# Initialize (automatically downloads model on first run)
ocr = BanglaOCR()
# Recognize text from an image
text = ocr.recognize('bangla_image.jpg')
print(text) # Output: বাংলা টেক্সট
📖 Usage Examples
From File Path
text = ocr.recognize('path/to/image.jpg')
From PIL Image
from PIL import Image
img = Image.open('image.png')
text = ocr.recognize(img)
From NumPy Array
import numpy as np
arr = np.array(Image.open('image.jpg'))
text = ocr.recognize(arr)
From URL
text = ocr.recognize('https://example.com/bangla_image.jpg')
From Bytes
with open('image.jpg', 'rb') as f:
img_bytes = f.read()
text = ocr.recognize(img_bytes)
Batch Prediction
texts = ocr.predict_batch(
['img1.jpg', 'img2.jpg', 'img3.jpg'],
show_progress=True
)
for text in texts:
print(text)
🔧 Advanced Configuration
Check System Status
from bangla_ocr import check_dependencies, print_info
# Print formatted info
print_info()
# Get dependency status
status = check_dependencies()
print(status)
# {'torch': True, 'cuda': True, 'mps': False, 'opencv': True, ...}
Device Selection
# Auto-detect best device (default)
ocr = BanglaOCR()
# Force specific device
ocr = BanglaOCR(device='cpu') # CPU
ocr = BanglaOCR(device='cuda') # NVIDIA GPU
ocr = BanglaOCR(device='mps') # Apple Silicon
Custom Model Parameters
ocr = BanglaOCR(
model_path='custom_model.pth',
tokenizer_path='tokenizer.pk',
d_model=256, # Hidden dimension
nheads=4, # Attention heads
num_decoder_layers=4 # Decoder layers
)
Get Model Info
ocr = BanglaOCR()
print(ocr.info)
# {'version': '1.0.0', 'device': 'cuda', 'd_model': 256, ...}
🏗️ Architecture
Input Image → ResNet-18 (CNN Encoder) → Transformer Decoder → Bangla Text
- Encoder: ResNet-18 backbone for visual feature extraction
- Positional Encoding: Sine encodings for spatial information
- Decoder: Transformer with multi-head attention
- Output: Character-level Bangla vocabulary
📋 Requirements
| Package | Version | Notes |
|---|---|---|
| Python | ≥ 3.8 | 3.9-3.12 recommended |
| PyTorch | ≥ 1.9.0 | Any version works |
| torchvision | ≥ 0.10.0 | Any version works |
| NumPy | ≥ 1.19.0 | Any version works |
| Pillow | ≥ 8.0.0 | Always used |
| OpenCV | Optional | PIL fallback available |
🌍 Platform Compatibility
| Platform | Status | Acceleration |
|---|---|---|
| Google Colab | ✅ Perfect | CUDA |
| Kaggle | ✅ Perfect | CUDA/TPU |
| Windows 10/11 | ✅ Perfect | CUDA |
| Linux (Ubuntu, etc.) | ✅ Perfect | CUDA |
| macOS Intel | ✅ Perfect | CPU |
| macOS Apple Silicon | ✅ Perfect | MPS |
| Docker | ✅ Perfect | CUDA |
| AWS/GCP/Azure | ✅ Perfect | CUDA |
🏋️ Training
Install training dependencies:
pip install bangla-ocr[train]
See the training guide for details.
📚 Citation
@inproceedings{ghosh2023bangla,
title={Towards Full-page Offline Bangla Handwritten Text Recognition},
author={Ghosh, A.},
booktitle={IEEE SILCON},
year={2023}
}
📄 License
MIT License - see LICENSE for details.
🤝 Contributing
Contributions welcome! Please submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bangla_ocr-1.0.3.tar.gz.
File metadata
- Download URL: bangla_ocr-1.0.3.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
935f24eab9bdcbb13bb8009ad646393ad675394ea687e86f453984f14029565c
|
|
| MD5 |
2721feb27b25b033921ec1950bd68ec3
|
|
| BLAKE2b-256 |
e2e051527ac7b7a4367c35dc06ef6e2a4e9897ba5d57b44727633e3f4ef2c7fd
|
File details
Details for the file bangla_ocr-1.0.3-py3-none-any.whl.
File metadata
- Download URL: bangla_ocr-1.0.3-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1eb6d213608f162e57db795105f71bb0c9e06f59d9b8ee1bf3bf49b7010940a
|
|
| MD5 |
d4af36eb157df31ba487710baee813fd
|
|
| BLAKE2b-256 |
dd97df0d8de4e4966ad607f2f2e646ede41799f709b27720ae2500aeec61b649
|