OCR Accuracy Reporter
Project description
============
Overview
============
Your OCR pipeline may have various stages and may use various tools.
You need a simple way to run sample/s as a whole or piece by piece and have a way to say that the OCR accuracy is say 98%.
=========
Usage
=========
>>> pip install ocraccuracyreporter
>>> from ocraccuracyreporter.oar import oar
.. topic:: initialising the reporter
>>> oreport = oar(expected='john', given='joh', label='name')
>>> print(oreport)
>>> name,john,joh,86,100,86,86,94,1
or you may have various ocr results for the same item, so you may want to initialise the expected alone
with or without a label
>>> oreport = oar(expected='john', label='name')
>>> oreport.given = 'joh'
>>> repr(oreoprt)
if you are creating a csv report with header info
>>>label,expected,given,ratio,partial_ratio,token_sort_ratio,token_set_ratio,jaro_winkler,distance
name,john,joh,86,100,86,86,94,1
.. topic:: Items in the report
ratio - uses pure Levenshtein Distance based matching
(100 - means perfect match)
partial_ratio - matches based on best substrings
token_sort_ratio - tokenizes the strings and sorts them alphabetically
token_set_ratio - tokenizes the strings and compared the intersection
jaro_winkler - this algorithm giving more weight to common prefix
(for example, some parts are good, missing others)
distance - this shows how many characters are really different in given
compared to expected
=========
Class variables
=========
label - a meaningful name for the ocr string.
expected - expected result
given - result you got out of ocr pipeline
total_expected_char_count - calculated expected char count
total_expected_word_count - calculated expected word count
total_given_char_count - calculated given char count
total_given_word_count - calculated given word count
Overview
============
Your OCR pipeline may have various stages and may use various tools.
You need a simple way to run sample/s as a whole or piece by piece and have a way to say that the OCR accuracy is say 98%.
=========
Usage
=========
>>> pip install ocraccuracyreporter
>>> from ocraccuracyreporter.oar import oar
.. topic:: initialising the reporter
>>> oreport = oar(expected='john', given='joh', label='name')
>>> print(oreport)
>>> name,john,joh,86,100,86,86,94,1
or you may have various ocr results for the same item, so you may want to initialise the expected alone
with or without a label
>>> oreport = oar(expected='john', label='name')
>>> oreport.given = 'joh'
>>> repr(oreoprt)
if you are creating a csv report with header info
>>>label,expected,given,ratio,partial_ratio,token_sort_ratio,token_set_ratio,jaro_winkler,distance
name,john,joh,86,100,86,86,94,1
.. topic:: Items in the report
ratio - uses pure Levenshtein Distance based matching
(100 - means perfect match)
partial_ratio - matches based on best substrings
token_sort_ratio - tokenizes the strings and sorts them alphabetically
token_set_ratio - tokenizes the strings and compared the intersection
jaro_winkler - this algorithm giving more weight to common prefix
(for example, some parts are good, missing others)
distance - this shows how many characters are really different in given
compared to expected
=========
Class variables
=========
label - a meaningful name for the ocr string.
expected - expected result
given - result you got out of ocr pipeline
total_expected_char_count - calculated expected char count
total_expected_word_count - calculated expected word count
total_given_char_count - calculated given char count
total_given_word_count - calculated given word count
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for ocraccuracyreporter-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8163291eddf141715cde92fbeceee5c7bc6a22eeeb4f214621a0790cb48ecf46 |
|
MD5 | 6d63c38ef06bb50628e837812ec96bec |
|
BLAKE2b-256 | 685966f452d66be5a421a6047b7f3f94a3b4d908a62ccde4f11d58cad176650d |