Multilingual handwritten OCR for student notes - production-grade text extraction

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Familiar2170

These details have not been verified by PyPI

Project description

🖋️ HandScribe OCR

Production-grade multilingual handwritten OCR for student notes

✨ Features

Three OCR Backends: EasyOCR, PaddleOCR, and TrOCR behind a unified interface
Multilingual Support: 80+ languages including English, Swahili, Arabic, Hindi, French
Advanced Preprocessing: Denoising, CLAHE contrast enhancement, adaptive binarization, deskew
CLI & REST API: Use from command line or integrate into any application
Docker Ready: One-command deployment, no Python environment needed
Batch Processing: Process hundreds of student notes automatically
Production-Grade: Tested, typed, CI/CD-enabled, PyPI-ready

🚀 Quick Start

Option 1: Docker (Recommended)

docker run -p 8000:8000 ronaldgosso/handscribe

# Test it
curl -X POST http://localhost:8000/ocr \
  -F "file=@my_notes.jpg" \
  -F "languages=en,sw"

Option 2: Python Package

pip install handscribe[easyocr]

# CLI
handscribe extract student_notes.jpg -b easyocr -l en,sw

# API server
uvicorn ocr_engine.api:api --host 0.0.0.0 --port 8000

Option 3: From Source

git clone https://github.com/ronaldgosso/handscribe.git
cd handscribe
pip install -e ".[easyocr]"
handscribe --help

📖 Usage

CLI

# Extract text
handscribe extract image.jpg -b easyocr -l en,sw

# Output as JSON
handscribe extract image.jpg --json

# Save to file
handscribe extract image.jpg -o output.txt -c 0.6

# Batch process a directory
handscribe batch ./student_notes/ -o ./results/

# Compare all backends on the same image
handscribe compare image.jpg -l en,sw

REST API

Start the server and open http://localhost:8000/docs for interactive docs.

# Extract text with bounding boxes
curl -X POST http://localhost:8000/ocr \
  -F "file=@notes.jpg" \
  -F "backend=easyocr" \
  -F "languages=en,sw" \
  -F "confidence=0.5"

# Plain text only
curl -X POST http://localhost:8000/ocr/text \
  -F "file=@notes.jpg"

# Batch (up to 10 files)
curl -X POST http://localhost:8000/ocr/batch \
  -F "files=@img1.jpg" -F "files=@img2.jpg"

Python API

from ocr_engine import OCREngine, OCRBackend

engine = OCREngine(
    backend=OCRBackend.EASYOCR,
    languages=["en", "sw"],
    confidence_threshold=0.5,
)

# With bounding boxes
results = engine.extract("student_notes.jpg")
for r in results:
    print(f"{r.text} (conf: {r.confidence:.2f})")

# Plain text
text = engine.extract_text("student_notes.jpg")

# Batch
batch = engine.extract_batch(["note1.jpg", "note2.jpg"])

🔧 OCR Backends Comparison

Backend	Best For	Languages	Speed	Accuracy
EasyOCR	Quick setup, mixed scripts	80+	⚡⚡⚡	⭐⭐⭐⭐
PaddleOCR	Fast processing, documents	80+	⚡⚡⚡⚡	⭐⭐⭐⭐
TrOCR	Handwriting accuracy	English*	⚡⚡	⭐⭐⭐⭐⭐

*TrOCR can be fine-tuned for other languages.

Language Codes

Language	EasyOCR	PaddleOCR
English	`en`	`en`
Swahili	`sw`	`en` (Latin script)
Arabic	`ar`	`arabic`
Hindi	`hi`	`hi`
French	`fr`	`french`

Tanzanian Context: For Swahili + English mixed notes, use -l en,sw with EasyOCR. The Latin script support handles most Swahili text well. For production-grade Swahili accuracy, fine-tuning TrOCR on a Swahili handwriting dataset is recommended.

🐳 Docker

# Run
docker run -p 8000:8000 ronaldgosso/handscribe

# Build from source
docker build -t handscribe .
docker run -p 8000:8000 handscribe

# Docker Compose
docker compose up -d

See CONTRIBUTING.md for full Docker setup instructions.

🏗️ Architecture

handscribe/
├── ocr_engine/
│   ├── __init__.py          # Package exports
│   ├── engine.py            # Core OCR engine (3 backends)
│   ├── preprocessing.py     # Advanced image preprocessing
│   ├── cli.py               # CLI interface (Typer)
│   └── api.py               # REST API (FastAPI)
├── tests/
│   └── test_engine.py       # Comprehensive test suite
├── pyproject.toml           # Package configuration
├── Dockerfile               # Multi-stage Docker build
├── docker-compose.yml       # Docker Compose setup
└── .github/workflows/ci.yml # CI/CD pipeline

🧪 Development

See CONTRIBUTING.md for the full developer guide, including:

Virtual environment setup
Running the CLI, API server, and Docker
Running tests with coverage
Linting with ruff, black, and mypy
Git workflow and PR submission
Adding new OCR backends

Quick start for developers:

git clone https://github.com/ronaldgosso/handscribe.git
cd handscribe
python -m venv .venv && source .venv/bin/activate  # or .venv\Scripts\Activate on Windows
pip install -e ".[all,dev]"
pytest tests/ -v

📝 Examples

Process Tanzanian Student Notes

handscribe extract notes.jpg -b easyocr -l en,sw -c 0.5
handscribe compare notes.jpg -l en,sw
handscribe batch ./semester_notes/ -l en,sw -o ./ocr_results/

API Integration (Python)

import requests

with open("student_notes.jpg", "rb") as f:
    response = requests.post(
        "http://localhost:8000/ocr",
        files={"file": f},
        data={"backend": "easyocr", "languages": "en,sw", "confidence": 0.5},
    )

print(response.json()["full_text"])

🙏 Acknowledgments

EasyOCR — Jaided AI for excellent multilingual OCR
PaddleOCR — PaddlePaddle team for fast OCR implementation
TrOCR — Microsoft for transformer-based handwriting OCR
Tanzanian Students — Inspiring this tool for real-world impact

📧 Contact

Ronald Gosso — ronaldgosso@gmail.com

Project Link: https://github.com/ronaldgosso/handscribe

Made with ❤️ for students everywhere

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Familiar2170

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.2

Apr 9, 2026

This version

0.1.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

handscribe-0.1.0.tar.gz (29.3 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

handscribe-0.1.0-py3-none-any.whl (27.9 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file handscribe-0.1.0.tar.gz.

File metadata

Download URL: handscribe-0.1.0.tar.gz
Upload date: Apr 9, 2026
Size: 29.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for handscribe-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5e9689294729c40fcee275f8b16e178567ab7014a1077e8243e1531104c997d8`
MD5	`07ffe1783201f5a7b279176564ea02fa`
BLAKE2b-256	`7dbce8b38f435a373d20a7ebe2ea6d4ac244272761845f7b6aec3e76cf0a2e50`

See more details on using hashes here.

Provenance

The following attestation bundles were made for handscribe-0.1.0.tar.gz:

Publisher: publish.yml on ronaldgosso/handscribe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: handscribe-0.1.0.tar.gz
- Subject digest: 5e9689294729c40fcee275f8b16e178567ab7014a1077e8243e1531104c997d8
- Sigstore transparency entry: 1262040259
- Sigstore integration time: Apr 9, 2026
Source repository:
- Permalink: ronaldgosso/handscribe@bbb231b1ca579283578f088c20715ea1b9291f23
- Branch / Tag: refs/heads/main
- Owner: https://github.com/ronaldgosso
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bbb231b1ca579283578f088c20715ea1b9291f23
- Trigger Event: workflow_dispatch

File details

Details for the file handscribe-0.1.0-py3-none-any.whl.

File metadata

Download URL: handscribe-0.1.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 27.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for handscribe-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8057f3bd62779735b82358108be6bd6fbe9e3fcafe63990cc364c72a003e2190`
MD5	`4ae84d55988d9e9044ab69352e175507`
BLAKE2b-256	`39b76c54576c72c801b302658c771eabc0d10de9163e9a656221ebecb622234c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for handscribe-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ronaldgosso/handscribe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: handscribe-0.1.0-py3-none-any.whl
- Subject digest: 8057f3bd62779735b82358108be6bd6fbe9e3fcafe63990cc364c72a003e2190
- Sigstore transparency entry: 1262040375
- Sigstore integration time: Apr 9, 2026
Source repository:
- Permalink: ronaldgosso/handscribe@bbb231b1ca579283578f088c20715ea1b9291f23
- Branch / Tag: refs/heads/main
- Owner: https://github.com/ronaldgosso
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bbb231b1ca579283578f088c20715ea1b9291f23
- Trigger Event: workflow_dispatch

handscribe 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

🖋️ HandScribe OCR

✨ Features

🚀 Quick Start

Option 1: Docker (Recommended)

Option 2: Python Package

Option 3: From Source

📖 Usage

CLI

REST API

Python API

🔧 OCR Backends Comparison

Language Codes

🐳 Docker

🏗️ Architecture

🧪 Development

📝 Examples

Process Tanzanian Student Notes

API Integration (Python)

🙏 Acknowledgments

📧 Contact

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance