Skip to main content

Python-tesseract is a python wrapper for google's Tesseract-OCR

Project description

Python-tesseract is an optical character recognition (OCR) tool for python.
That is, it will recognize and "read" the text embedded in images.

Python-tesseract is a wrapper for google's Tesseract-OCR
( http://code.google.com/p/tesseract-ocr/ ). It is also useful as a
stand-alone invocation script to tesseract, as it can read all image types
supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff,
and others, wheras tesseract-ocr by default only supports tiff and bmp.
Additionally, if used as a script, Python-tesseract will print the recognized
text in stead of writing it to a file. Support for confidence estimates and
bounding box data is planned for future releases.


USAGE:
```
> import Image
> import pytesseract
> print pytesseract.image_to_string(Image.open('test.png'))
> print pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra')
```

INSTALLATION:
* Python-tesseract requires python 2.5 or later.
* You will need the Python Imaging Library (PIL). Under Debian/Ubuntu, this is
the package "python-imaging".
* Install google tesseract-ocr from http://code.google.com/p/tesseract-ocr/ .
You must be able to invoke the tesseract command as "tesseract". If this
isn't the case, for example because tesseract isn't in your PATH, you will
have to change the "tesseract_cmd" variable at the top of 'tesseract.py'.


LICENSE:
Python-tesseract is released under the GPL v3.

CONTRIBUTERS:
- Originally written by [Samuel Hoffstaetter](https://github.com/hoffstaetter)
- [Juarez Bochi](https://github.com/jbochi)
- [Matthias Lee](https://github.com/madmaze)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytesseract-0.1.3.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

pytesseract-0.1.3-py2.7.egg (151.8 kB view details)

Uploaded Source

File details

Details for the file pytesseract-0.1.3.tar.gz.

File metadata

File hashes

Hashes for pytesseract-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5bdc9890b9a1fe2302c70a93ce26a9c6efc900c4057f39b65ba14ac863039b8a
MD5 4c56b70e8d0ddabdb1237d39ed3e300b
BLAKE2b-256 8714e044c1e0259cd43f28a92c82fc23b15091d72c4c50a9b2411a132fee754f

See more details on using hashes here.

File details

Details for the file pytesseract-0.1.3-py2.7.egg.

File metadata

File hashes

Hashes for pytesseract-0.1.3-py2.7.egg
Algorithm Hash digest
SHA256 3d8c1484d215573466f95aa4c311e9b0c120183be2d9c842708a6d73889f7f8f
MD5 712eef2669f3e19abda39a7639faa5a9
BLAKE2b-256 87424332790f20c68464ecb8c9db00cef33eff259a094da94ef4480a79b4fc6e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page