A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.

These details have not been verified by PyPI

Project links

Homepage

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

Project description

keras-ocr

This is a slightly polished and packaged version of the Keras CRNN implementation and the published CRAFT text detection model. It provides a high level API for training a text detection and OCR pipeline.

Please see the documentation for more examples, including for training a custom model.

Getting Started

Installation

# To install from master
pip install git+https://github.com/faustomorales/keras-ocr.git#egg=keras-ocr

# To install from PyPi
pip install keras-ocr

Using

The package ships with an easy-to-use implementation of the CRAFT text detection model from this repository and the CRNN recognition model from this repository.

import keras_ocr

# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()

# Predictions is a list of (string, box) tuples.
predictions = pipeline.recognize(image='tests/test_image.jpg')

example of labeled image

Comparing keras-ocr and other OCR approaches

You may be wondering how the models in this package compare to existing cloud OCR APIs. We provide some metrics below and the notebook used to compute them using the first 1,000 images in the COCO-Text validation set. We limited it to 1,000 because the Google Cloud free tier is for 1,000 calls a month at the time of this writing. As always, caveats apply:

No guarantees apply to these numbers -- please beware and compute your own metrics independently to verify them. As of this writing, they should be considered a very rough first draft. Please open an issue if you find a mistake. In particular, the cloud APIs have a variety of options that one can use to improve their performance and the responses can be parsed in different ways. It is possible that I made some error in configuration or parsing. Again, please open an issue if you find a mistake!
We ignore punctuation and letter case because the out-of-the-box recognizer in keras-ocr (provided by this independent repository) does not support either. Note that both AWS Rekognition and Google Cloud Vision support punctuation as well as upper and lowercase characters.
We ignore non-English text.
We ignore illegible text.

model	latency	precision	recall
AWS	719ms	0.45	0.48
GCP	388ms	0.53	0.58
keras-ocr (scale=2)	417ms	0.53	0.54
keras-ocr (scale=3)	699ms	0.5	0.59

Precision and recall were computed based on an intersection over union of 50% or higher and a text similarity to ground truth of 50% or higher.
keras-ocr latency values were computed using a Tesla P4 GPU on Google Colab. scale refers to the argument provided to keras_ocr.pipelines.Pipeline() which determines the upscaling applied to the image prior to inference.
Latency for the cloud providers was measured with sequential requests, so you can obtain significant speed improvements by making multiple simultaneous API requests.
Each of the entries provides a link to the JSON file containing the annotations made on each pass. You can use this with the notebook to compute metrics without having to make the API calls yourself (though you are encoraged to replicate it independently)!

Why not compare to Tesseract? In every configuration I tried, Tesseract did very poorly on this test. Tesseract performs best on scans of books, not on incidental scene text like that in this dataset.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.9.3

Nov 6, 2023

0.9.2

Dec 24, 2022

0.9.1

May 19, 2022

0.9.0

May 16, 2022

0.8.9

Nov 24, 2021

0.8.8

Oct 7, 2021

0.8.7

Jun 14, 2021

0.8.6

Nov 26, 2020

0.8.5

Sep 13, 2020

0.8.4 yanked

Jun 14, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.8.3 yanked

Apr 11, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.8.2 yanked

Apr 4, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.8.1 yanked

Apr 4, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.8.0 yanked

Apr 1, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.7.1 yanked

Mar 9, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.7.0 yanked

Mar 6, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.6.3 yanked

Feb 23, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.6.2 yanked

Feb 3, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.6.1 yanked

Jan 20, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.6.0 yanked

Jan 12, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.5.4 yanked

Jan 9, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.5.3 yanked

Jan 6, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

This version

0.5.2 yanked

Jan 5, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.5.1 yanked

Jan 4, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.5.0 yanked

Jan 4, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.4.2 yanked

Jan 2, 2020

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.4.1 yanked

Dec 31, 2019

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.4.0 yanked

Dec 31, 2019

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.3.2 yanked

Dec 24, 2019

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.3.1 yanked

Dec 24, 2019

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.3 yanked

Dec 24, 2019

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.2 yanked

Oct 3, 2019

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

0.1 yanked

Oct 3, 2019

Reason this release was yanked:

The URLs for weights have changed. Please upgrade.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keras-ocr-0.5.2.tar.gz (160.0 kB view hashes)

Uploaded Jan 5, 2020 Source

Hashes for keras-ocr-0.5.2.tar.gz

Hashes for keras-ocr-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`fb3c5419e84ea692f786c10eedec80980e67925f793d604ff88f5d90dce0c42b`
MD5	`cb9d3e1252fbc89046a1f8cb5c49a508`
BLAKE2b-256	`f6d56396598e55a5b1668361171f514304511f63a4e4e02c4d9d9cc328370163`