Skip to main content

OCR Accuracy Reporter

Project description

============
Overview
============

Your OCR pipeline may have various stages and may use various tools.
You need a simple way to run sample/s as a whole or piece by piece and have a way to say that the OCR accuracy is say 98%.

=========
Usage
=========

>>> pip install ocraccuracyreporter
>>> from ocraccuracyreporter.oar import oar

.. topic:: initialising the reporter

>>> oreport = oar(expected='john', given='joh', label='name')

>>> print(oreport)
>>> name,john,joh,86,100,86,86,94,1

or you may have various ocr results for the same item, so you may want to initialise the expected alone
with or without a label

>>> oreport = oar(expected='john', label='name')
>>> oreport.given = 'joh'
>>> repr(oreoprt)
if you are creating a csv report with header info
>>>label,expected,given,ratio,partial_ratio,token_sort_ratio,token_set_ratio,jaro_winkler,distance
name,john,joh,86,100,86,86,94,1

.. topic:: Items in the report


ratio - uses pure Levenshtein Distance based matching
(100 - means perfect match)

partial_ratio - matches based on best substrings

token_sort_ratio - tokenizes the strings and sorts them alphabetically

token_set_ratio - tokenizes the strings and compared the intersection

jaro_winkler - this algorithm giving more weight to common prefix
(for example, some parts are good, missing others)

distance - this shows how many characters are really different in given
compared to expected




=========
Class variables
=========

label - a meaningful name for the ocr string.
expected - expected result
given - result you got out of ocr pipeline

total_expected_char_count - calculated expected char count
total_expected_word_count - calculated expected word count

total_given_char_count - calculated given char count
total_given_word_count - calculated given word count

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocraccuracyreporter-0.0.5.tar.gz (3.1 kB view details)

Uploaded Source

File details

Details for the file ocraccuracyreporter-0.0.5.tar.gz.

File metadata

File hashes

Hashes for ocraccuracyreporter-0.0.5.tar.gz
Algorithm Hash digest
SHA256 dff84d079f6abf75510345282e2b1b1f5666094afa506d4327ecf8936c088a13
MD5 8fba8ae1fb617bd8d6bfd231d459f556
BLAKE2b-256 9f5a3cfabc321cd8e9fd0796af14dfbe9bb2efec2add0d35cafccfa83c929c1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page