Skip to main content

Tesserocr bindings

Project description

Segment region, line, recognize with tesserocr

https://circleci.com/gh/OCR-D/ocrd_tesserocr.svg?style=svg https://img.shields.io/pypi/v/ocrd_tesserocr.svg Docker Automated build

Installation

Required ubuntu packages:

  • Tesseract headers (libtesseract-dev)

  • Some tesseract language models (tesseract-ocr-{eng,deu,frk,...} or script models (tesseract-ocr-script-{latn,frak,...})

  • Leptonica headers (libleptonica-dev)

pip install -r requirements
pip install .

If tesserocr fails to compile with an error::

$PREFIX/include/tesseract/unicharset.h:241:10: error: ‘string’ does not name a type; did you mean ‘stdin’?
       static string CleanupString(const char* utf8_str) {
              ^~~~~~
              stdin

This is due to some inconsistencies in the installed tesseract C headers (fix expected for next Ubuntu upgrade, already fixed for Debian). Replace string with std::string in $PREFIX/include/tesseract/unicharset.h:265:5: and $PREFIX/include/tesseract/unichar.h:164:10: ff.

If tesserocr fails with an error about LSTM/CUBE, you have a mismatch between tesseract header/data/pkg-config versions. apt policy libtesseract-dev lists the apt-installable versions, keep it consistent. Make sure there are no spurious pkg-config artifacts, e.g. in /usr/local/lib/pkgconfig/tesseract.pc. The same goes for language models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrd_tesserocr-0.3.0.tar.gz (22.7 kB view hashes)

Uploaded Source

Built Distribution

ocrd_tesserocr-0.3.0-py3-none-any.whl (34.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page