Synthetic Text Dataset Generator for OCR Training
Project description
🍞 TextBaker
⚠️ Disclaimer: This project was developed with the assistance of AI tools (GitHub Copilot). While efforts have been made to ensure quality, it may still contain errors or bugs. Use at your own discretion and feel free to report any issues.
Synthetic Text Dataset Generator for OCR Training
TextBaker generates synthetic text images by combining character datasets with backgrounds and applying transformations. Perfect for training OCR models and data augmentation.
🖼️ Example Outputs
| Basic | Transformed | Colored | Background | Texture | Full Pipeline |
|---|---|---|---|---|---|
📖 See Examples Documentation for code samples.
✨ Features
- 🎨 GUI Application - Interactive interface for real-time text generation
- ✏️ Custom Character Drawing - Draw and save custom characters directly in the app
- ✂️ Character Segmentation - Extract characters from images using polygon selection
- 🖥️ CLI Tool - Batch processing from command line
- 📚 Python Library - Programmatic API for integration
- 🔄 Transformations - Rotation, perspective, scale, shear
- 🎭 Textures & Backgrounds - Apply overlays and composite on images
- 🔧 YAML/JSON Configs - Save and load configurations
📦 Installation
pip install textbaker
Or from source:
git clone https://github.com/q-viper/text-baker.git
cd text-baker
pip install -e .
🚀 Quick Start
GUI
textbaker
CLI
# Generate specific texts
textbaker generate "Hello" "World" -d ./dataset -o ./output
# Generate random samples with transforms
textbaker generate -n 100 --seed 42 -r "-15,15" -b ./backgrounds
Python
from textbaker import TextGenerator, GeneratorConfig
generator = TextGenerator()
result = generator.generate("Hello")
generator.save(result)
📖 See full documentation for detailed usage.
📁 Dataset Structure
dataset/
├── A/
│ ├── sample1.png
│ └── sample2.png
├── B/
│ └── ...
└── 0/
└── ...
📖 Documentation
- Installation & Quick Start
- Examples & Code Samples
- Configuration Reference
- CLI Reference
- API Reference
🧪 Development
git clone https://github.com/q-viper/text-baker.git
cd text-baker
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run linting
ruff check . && ruff format .
📄 License
MIT License - see LICENSE for details.
🤝 Contributing
Contributions welcome! Please fork, create a feature branch, and submit a PR.
👤 Author
Ramkrishna Acharya (@q-viper)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file textbaker-0.1.6.tar.gz.
File metadata
- Download URL: textbaker-0.1.6.tar.gz
- Upload date:
- Size: 57.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bd560d067e35dd02fa1ccfc35b71d653bddc013070fa14d49403b99db7cff49
|
|
| MD5 |
4cadcb64ba20317e8bcae980ac4a0f72
|
|
| BLAKE2b-256 |
45678a66334d3acb6320f4fb64dc7290dc05be49cc9b273483c950bc16626618
|
File details
Details for the file textbaker-0.1.6-py3-none-any.whl.
File metadata
- Download URL: textbaker-0.1.6-py3-none-any.whl
- Upload date:
- Size: 56.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75b1814cfaea5d5407d351738698815726e4cc2df68a33d196f4d65f70a9cf0e
|
|
| MD5 |
dc297699caf7379d986c47310c6bc8b7
|
|
| BLAKE2b-256 |
95c0b87b83fe3f130c06997c60d1746237c6877abf16e9b917023f74c05a0eb1
|