Tesserocr bindings
Project description
Segment region, line, recognize with tesserocr
Installation
Required ubuntu packages:
Tesseract headers (libtesseract-dev)
Some tesseract language models (tesseract-ocr-{eng,deu,frk,...} or script models (tesseract-ocr-script-{latn,frak,...})
Leptonica headers (libleptonica-dev)
pip install -r requirements pip install .
If tesserocr fails to compile with an error::
$PREFIX/include/tesseract/unicharset.h:241:10: error: ‘string’ does not name a type; did you mean ‘stdin’? static string CleanupString(const char* utf8_str) { ^~~~~~ stdin
This is due to some inconsistencies in the installed tesseract C headers (fix expected for next Ubuntu upgrade, already fixed for Debian). Replace string with std::string in $PREFIX/include/tesseract/unicharset.h:265:5: and $PREFIX/include/tesseract/unichar.h:164:10: ff.
If tesserocr fails with an error about LSTM/CUBE, you have a mismatch between tesseract header/data/pkg-config versions. apt policy libtesseract-dev lists the apt-installable versions, keep it consistent. Make sure there are no spurious pkg-config artifacts, e.g. in /usr/local/lib/pkgconfig/tesseract.pc. The same goes for language models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for ocrd_tesserocr-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1648df71d28a9b3388f1e701256037eb9023f149a17a22d0a9c2dec4a0510002 |
|
MD5 | bbc586d5a04c44b640d7782a84e2de83 |
|
BLAKE2b-256 | 3408ea3ebc9476e1d28672e23b8d1332dbbc95ac9a3246cd7d02be2375995da6 |
Hashes for ocrd_tesserocr-0.1.3-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1549fbf8d314dc1f5ea20b45842e971a97b3c276f78d4d167a463432d5b77b18 |
|
MD5 | 0f69aed68ca01cf1018b35d91227d74a |
|
BLAKE2b-256 | 187ffd08ca819e6f3980220ac680b5c931080247544c2704963e518db6f7a3d0 |