Collection of utilities for Tesseract OCR training
Project description
Python utilities for Tesseract OCR training
This module is a collection of different training utilities for Tesseract OCR. These utilities are also implemented as console scripts, hence they can be run from command line.
Utilities
All utilities list their command line switches when run with the switch --help
.
rewrap
just rewraps text lines by specified maximal line lengthcreate_dictdata
creates all word- and n-gram-lists from a text file, which are translated to DAWGs and added to the traineddata file thenlanguage_metrics
creates random texts from supplied wordlist and tests for recognition error ratescollect_ambiguities
extracts error-correction pairs from reference-hypothesis pairs and stores them in a JSON filejson2unicharambigs
stores specified error-correction pairs from JSON file in a unicharambigs file
Requirements
This module requires the following modules to work:
- pytesseract (Running Tesseract OCR)
- editdistance (Calculation of error rates)
Packages
The module is split in several packages. The package pytesstrain.train
contains the workhorse function
run_text()
. The package pytesstrain.cli
contains the utilities you might run at the command line. The package
pytesstrain.ambigs
contains function around unicharambigs
file. The package pytesstrain.text2image
contains
the interface to the text2image
command from the Tesseract OCR; the interface relies on pytesseract
module
and is modelled after it as well. The package pytesstrain.metrics
contains error rate calculations, as well
the interface class Metrics
. The package pytesstrain.utils
contains auxiliary functions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pytesstrain-0.1.1.tar.gz
.
File metadata
- Download URL: pytesstrain-0.1.1.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.5.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f3e0433f4b86dc6663f02b809490b7082043879bdd94a3eec0a8075020313ea |
|
MD5 | f7f4b2cf8adc0b113f35e6cfa426b20c |
|
BLAKE2b-256 | 2e4cfa3dcb279aa82946e56ccf9acfc3557e39c2aa6090fb0949fa1a362b9a7f |
File details
Details for the file pytesstrain-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pytesstrain-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.5.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a16e4d1167462dee6137faf536fd4ed6fc7aeef2e4e534471eaa0f9d4c63b2e |
|
MD5 | 7873d5b782ba1e82cf2a14f6061849dd |
|
BLAKE2b-256 | 9e20e180ae421551259c05a42ddfaf825a06b184ce63610bfa423b88cb232779 |