An implementation library of Tiling and Corruption (TACo) Augmentations for OCR/HTR!
Project description
Tiling and Corruption (TACo)
TACo is a simple and effective data augmentation technique for the task of Optical Character Recognition (OCR) or Handwritten Text Recognition (HTR) (check reference).
And, taco-box is an implementation of TACo algorithm. This is currently under the Apache 2.0, Please feel free to use for your project. Enjoy!
Installing
First, you need to have python 3 installed in your system.
Next, you can Install taco-box with pip or your favorite PyPi package manager.
pip install taco-box
Usage
Checkout this jupyter notebook on usage - Notebook
Here is an example:
from tacobox import Taco
# creating Taco object. (Note: parameters are at their default value.)
mytaco = Taco(cp_vertical=0.25,
cp_horizontal=0.25,
max_tw_vertical=100,
min_tw_vertical=20,
max_tw_horizontal=50,
min_tw_horizontal=10
)
# apply random vertical corruption
augmented_img = mytaco.apply_vertical_taco(input_img, corruption_type='random')
mytaco.visualize(augmented_img)
-------Understanding Arguments--------
:cp_vertical: corruption probability of vertical tiles
:cp_horizontal: corruption probability for horizontal tiles
:max_tw_vertical: maximum possible tile width for vertical tiles in pixels
:min_tw_vertical: minimum tile width for vertical tiles in pixels
:max_tw_horizontal: maximum possible tile width for horizontal tiles in pixels
:min_tw_horizontal: minimum tile width for horizontal tiles in pixels
Expected results
Below picture shows the variations of TACo augmentation algorithm from current implementation:-
Contributing
This project is in very early stages of development. If there is an issue or feature request, feel free to open an issue. Additionally, a PR is always welcome.
Reference
TACo algorithm is part of a research project on Handwritten Text Recognition. Link to the original paper will be posted soon!!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file taco-box-0.0.1.tar.gz.
File metadata
- Download URL: taco-box-0.0.1.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8e5d3f576ae4be7d8c61d5caa66bdaedef2e78c158f8d537598c8eb01a1f1dd
|
|
| MD5 |
87db7c2311465974761c1752f5b5fe70
|
|
| BLAKE2b-256 |
ba6794ec37bc6920da9a9ee4d64c30c9c3ee81cedf586c91be89c7acec58447c
|