Skip to main content

OCR Accuracy Reporter

Project description

============
Overview
============

Your OCR pipeline may have various stages and may use various tools.
You need a simple way to run sample/s as a whole or piece by piece and have a way to say that the OCR accuracy is say 98%.

=========
Usage
=========

>>> pip install ocraccuracyreporter
>>> from ocraccuracyreporter.oar import oar

.. topic:: initialising the reporter

>>> oreport = oar(expected='john', given='joh', label='name')

>>> print(oreport)
>>> name,john,joh,86,100,86,86,94,1

or you may have various ocr results for the same item, so you may want to initialise the expected alone
with or without a label

>>> oreport = oar(expected='john', label='name')
>>> oreport.given = 'joh'
>>> repr(oreoprt)
if you are creating a csv report with header info
>>>label,expected,given,ratio,partial_ratio,token_sort_ratio,token_set_ratio,jaro_winkler,distance
name,john,joh,86,100,86,86,94,1

.. topic:: Items in the report


ratio - uses pure Levenshtein Distance based matching
(100 - means perfect match)

partial_ratio - matches based on best substrings

token_sort_ratio - tokenizes the strings and sorts them alphabetically

token_set_ratio - tokenizes the strings and compared the intersection

jaro_winkler - this algorithm giving more weight to common prefix
(for example, some parts are good, missing others)

distance - this shows how many characters are really different in given
compared to expected




=========
Class variables
=========

label - a meaningful name for the ocr string.
expected - expected result
given - result you got out of ocr pipeline

total_expected_char_count - calculated expected char count
total_expected_word_count - calculated expected word count

total_given_char_count - calculated given char count
total_given_word_count - calculated given word count

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocraccuracyreporter-0.0.4.tar.gz (3.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page