Skip to main content

Synthetic Text Dataset Generator for OCR Training

Project description

🍞 TextBaker

CI Coverage Status PyPI version PyPI Downloads Python 3.9+ License: MIT Code style: ruff Docs

⚠️ Disclaimer: This project was developed with the assistance of AI tools (GitHub Copilot). While efforts have been made to ensure quality, it may still contain errors or bugs. Use at your own discretion and feel free to report any issues.

Synthetic Text Dataset Generator for OCR Training

TextBaker generates synthetic text images by combining character datasets with backgrounds and applying transformations. Perfect for training OCR models and data augmentation.

TextBaker Icon

🖼️ Example Outputs

Basic Transformed Colored Background Texture Full Pipeline
Basic Rotated Colored Background Texture Full

📖 See Examples Documentation for code samples.

✨ Features

  • 🎨 GUI Application - Interactive interface for real-time text generation
  • ✏️ Custom Character Drawing - Draw and save custom characters directly in the app
  • ✂️ Character Segmentation - Extract characters from images using polygon selection
  • 🖥️ CLI Tool - Batch processing from command line
  • 📚 Python Library - Programmatic API for integration
  • 🔄 Transformations - Rotation, perspective, scale, shear
  • 🎭 Textures & Backgrounds - Apply overlays and composite on images
  • 🔧 YAML/JSON Configs - Save and load configurations

📦 Installation

pip install textbaker

Or from source:

git clone https://github.com/q-viper/text-baker.git
cd text-baker
pip install -e .

🚀 Quick Start

GUI

textbaker

CLI

# Generate specific texts
textbaker generate "Hello" "World" -d ./dataset -o ./output

# Generate random samples with transforms
textbaker generate -n 100 --seed 42 -r "-15,15" -b ./backgrounds

Python

from textbaker import TextGenerator, GeneratorConfig

generator = TextGenerator()
result = generator.generate("Hello")
generator.save(result)

📖 See full documentation for detailed usage.

📁 Dataset Structure

dataset/
├── A/
│   ├── sample1.png
│   └── sample2.png
├── B/
│   └── ...
└── 0/
    └── ...

📖 Documentation

🧪 Development

git clone https://github.com/q-viper/text-baker.git
cd text-baker
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run linting
ruff check . && ruff format .

📄 License

MIT License - see LICENSE for details.

🤝 Contributing

Contributions welcome! Please fork, create a feature branch, and submit a PR.

👤 Author

Ramkrishna Acharya (@q-viper)


Built with PySide6, OpenCV, and Typer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textbaker-0.1.6.tar.gz (57.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textbaker-0.1.6-py3-none-any.whl (56.9 kB view details)

Uploaded Python 3

File details

Details for the file textbaker-0.1.6.tar.gz.

File metadata

  • Download URL: textbaker-0.1.6.tar.gz
  • Upload date:
  • Size: 57.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for textbaker-0.1.6.tar.gz
Algorithm Hash digest
SHA256 0bd560d067e35dd02fa1ccfc35b71d653bddc013070fa14d49403b99db7cff49
MD5 4cadcb64ba20317e8bcae980ac4a0f72
BLAKE2b-256 45678a66334d3acb6320f4fb64dc7290dc05be49cc9b273483c950bc16626618

See more details on using hashes here.

File details

Details for the file textbaker-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: textbaker-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 56.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for textbaker-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 75b1814cfaea5d5407d351738698815726e4cc2df68a33d196f4d65f70a9cf0e
MD5 dc297699caf7379d986c47310c6bc8b7
BLAKE2b-256 95c0b87b83fe3f130c06997c60d1746237c6877abf16e9b917023f74c05a0eb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page