Skip to main content

OCR Accuracy Reporter

Project description


Your OCR pipeline may have various stages and may use various tools.
You need a simple way to run sample/s as a whole or piece by piece and have a way to say that the OCR accuracy is say 98%.


>>> pip install ocraccuracyreporter
>>> from ocraccuracyreporter.oar import oar

.. topic:: initialising the reporter

>>> oreport = oar(expected='john', given='joh', label='name')

>>> print(oreport)
>>> name,john,joh,86,100,86,86,94,1

or you may have various ocr results for the same item, so you may want to initialise the expected alone
with or without a label

>>> oreport = oar(expected='john', label='name')
>>> oreport.given = 'joh'
>>> repr(oreoprt)
if you are creating a csv report with header info

.. topic:: Items in the report

ratio - uses pure Levenshtein Distance based matching
(100 - means perfect match)

partial_ratio - matches based on best substrings

token_sort_ratio - tokenizes the strings and sorts them alphabetically

token_set_ratio - tokenizes the strings and compared the intersection

jaro_winkler - this algorithm giving more weight to common prefix
(for example, some parts are good, missing others)

distance - this shows how many characters are really different in given
compared to expected

Class variables

label - a meaningful name for the ocr string.
expected - expected result
given - result you got out of ocr pipeline

total_expected_char_count - calculated expected char count
total_expected_word_count - calculated expected word count

total_given_char_count - calculated given char count
total_given_word_count - calculated given word count

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
ocraccuracyreporter-0.0.5.tar.gz (3.1 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page