Skip to main content

An implementation library of Tiling and Corruption (TACo) Augmentations for OCR/HTR!

Project description

Tiling and Corruption (TACo)License

TACo is a simple and effective data augmentation technique for the task of Optical Character Recognition (OCR) or Handwritten Text Recognition (HTR) (check reference).

And, taco-box is an implementation of TACo algorithm. This is currently under the Apache 2.0, Please feel free to use for your project. Enjoy!

Installing

First, you need to have python 3 installed in your system.

Next, you can Install taco-box with pip or your favorite PyPi package manager.

pip install taco-box

Usage

Checkout this jupyter notebook on usage - Notebook

Here is an example:

from tacobox import Taco

# creating Taco object. (Note: parameters are at their default value.)
mytaco = Taco(cp_vertical=0.25,
                cp_horizontal=0.25,
                max_tw_vertical=100,
                min_tw_vertical=20,
                max_tw_horizontal=50,
                min_tw_horizontal=10
                )

# apply random vertical corruption
augmented_img = mytaco.apply_vertical_taco(input_img, corruption_type='random')
mytaco.visualize(augmented_img)
    -------Understanding Arguments--------
    :cp_vertical:        corruption probability of vertical tiles
    :cp_horizontal:      corruption probability for horizontal tiles
    :max_tw_vertical:    maximum possible tile width for vertical tiles in pixels
    :min_tw_vertical:    minimum tile width for vertical tiles in pixels
    :max_tw_horizontal:  maximum possible tile width for horizontal tiles in pixels
    :min_tw_horizontal:  minimum tile width for horizontal tiles in pixels

Expected results

Below picture shows the variations of TACo augmentation algorithm from current implementation:-

Example Output

Contributing

This project is in very early stages of development. If there is an issue or feature request, feel free to open an issue. Additionally, a PR is always welcome.

Reference

TACo algorithm is part of a research project on Handwritten Text Recognition. Link to the original paper will be posted soon!!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taco-box-0.1.1.tar.gz (3.6 kB view details)

Uploaded Source

File details

Details for the file taco-box-0.1.1.tar.gz.

File metadata

  • Download URL: taco-box-0.1.1.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for taco-box-0.1.1.tar.gz
Algorithm Hash digest
SHA256 526ba2af3ce617baaf56284fbf238e0a2b8038d14588f05a9889fb4ede76a68d
MD5 278c894b48c7fa19befbcf2c68521844
BLAKE2b-256 ec67796dfdda08f47cf094ab25f979437236e2bcc70952eed3365276a5e5b937

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page