A Python wrapper for OCR engines (Tesseract, Cuneiform, etc)
Project description
Pyocr is an optical character recognition (OCR) tool wrapper for python.
That is, it helps using OCR tools from a Python program.
It has been tested only on GNU/Linux systems. It should also work on similar
systems (*BSD, etc). It doesn't work on Windows, MacOSX, etc.
Pyocr can be used as a wrapper for google's Tesseract-OCR
( http://code.google.com/p/tesseract-ocr/ ) or Cuneiform. It can read all image
types supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff,
and others. It also support bounding box data.
USAGE:
import Image
import sys
from pyocr import pyocr
tools = pyocr.get_available_tools()[:]
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
print("Using '%s'" % (tools[0].get_name()))
tools[0].image_to_string(Image.open('test.png'), lang='fra',
builder=TextBuilder())
DEPENDENCIES:
* Pyocr requires python 2.5 or later.
* You will need the Python Imaging Library (PIL). Under Debian/Ubuntu, this is
the package "python-imaging".
* Install an OCR:
* tesseract-ocr from http://code.google.com/p/tesseract-ocr/ .
You must be able to invoke the tesseract command as "tesseract".
Python-tesseract is tested with Tesseract >= 3.01 only.
* or cuneiform
INSTALLATION:
$ sudo python ./setup.py install
TESTS:
Tests are made to be run with the latest versions of Tesseract and Cuneiform.
the first test verifies that you're using the expected version.
COPYRIGHT:
Pyocr is released under the GPL v3+.
tesseract.py:
Copyright (c) Samuel Hoffstaetter, 2009
Copyright (c) Jerome Flesch, 2011-2012
other files:
Copyright (c) Jerome Flesch, 2011-2012
http://wiki.github.com/jflesch/python-tesseract
That is, it helps using OCR tools from a Python program.
It has been tested only on GNU/Linux systems. It should also work on similar
systems (*BSD, etc). It doesn't work on Windows, MacOSX, etc.
Pyocr can be used as a wrapper for google's Tesseract-OCR
( http://code.google.com/p/tesseract-ocr/ ) or Cuneiform. It can read all image
types supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff,
and others. It also support bounding box data.
USAGE:
import Image
import sys
from pyocr import pyocr
tools = pyocr.get_available_tools()[:]
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
print("Using '%s'" % (tools[0].get_name()))
tools[0].image_to_string(Image.open('test.png'), lang='fra',
builder=TextBuilder())
DEPENDENCIES:
* Pyocr requires python 2.5 or later.
* You will need the Python Imaging Library (PIL). Under Debian/Ubuntu, this is
the package "python-imaging".
* Install an OCR:
* tesseract-ocr from http://code.google.com/p/tesseract-ocr/ .
You must be able to invoke the tesseract command as "tesseract".
Python-tesseract is tested with Tesseract >= 3.01 only.
* or cuneiform
INSTALLATION:
$ sudo python ./setup.py install
TESTS:
Tests are made to be run with the latest versions of Tesseract and Cuneiform.
the first test verifies that you're using the expected version.
COPYRIGHT:
Pyocr is released under the GPL v3+.
tesseract.py:
Copyright (c) Samuel Hoffstaetter, 2009
Copyright (c) Jerome Flesch, 2011-2012
other files:
Copyright (c) Jerome Flesch, 2011-2012
http://wiki.github.com/jflesch/python-tesseract
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyocr-0.2.0.tar.gz
(9.4 kB
view hashes)