Skip to main content

HTR / OCR models evaluation agnostic Python package.

Project description

Cthulhu

Cthulhu is a lightweight Python library for evaluating the quality of HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition) transcriptions.

It compares a ground truth against a prediction and returns a set of standard metrics (WER, CER, Levenshtein distance, etc.), with optional text normalisation transforms applied before scoring.

Originally derived from kami-lib — redesigned as a standalone evaluation toolkit with no deep-learning dependencies.


Features

  • Flexible input — accepts plain strings, .txt files, ALTO XML, and PAGE-XML
  • Rich metrics — WER, CER, WACC, MER, CIP, CIL, ROUGE-1/2/L, Levenshtein, Hamming, weighted variants
  • Text transforms — normalise both sequences before scoring (remove digits, punctuation, diacritics, change case)
  • No heavy dependencies — no Kraken, no PyTorch; pure Python + three lightweight packages

Installation

pip install cthulhu-eval

Dependencies:

Package Role
python-Levenshtein >= 0.21 Fast C-extension for edit distance
Unidecode >= 1.3 Diacritics removal
termcolor >= 1.1 Coloured log output

Python: 3.9 or later


Quick start

from cthulhu.Cthulhu import Cthulhu

# Two plain strings
k = Cthulhu(["ground truth text", "predicted text"])
print(k.scores.board)   # full metrics dict
print(k.scores.wer)     # word error rate
print(k.scores.cer)     # character error rate

Input formats

data must be a list of exactly two elements. Each element can be:

Type Example
Plain string "mon texte de référence"
Text file "./gt.txt"
ALTO XML (v2–v4) "./document_alto.xml"
PAGE-XML (PcGts) "./document_page.xml"

Mix and match freely:

# XML ground truth vs plain-text prediction
k = Cthulhu(["./gt_alto.xml", "./prediction.txt"])

# Two XML files
k = Cthulhu(["./gt_page.xml", "./pred_page.xml"])

# String vs text file
k = Cthulhu(["reference string", "./pred.txt"])

Metrics

All metrics are available via k.scores.board (dict) or as individual attributes on k.scores.

Metric Attribute Description
Levenshtein distance (char) lev_distance_char Edit distance at character level
Levenshtein distance (word) lev_distance_words Edit distance at word level
Hamming distance hamming Char-level; "Ø" if lengths differ
WER wer Word Error Rate
CER cer Character Error Rate
WACC wacc Word Accuracy (1 − WER)
WER Hunt wer_hunt WER with halved insertion/deletion costs
MER mer Match Error Rate
CIP cip Character Information Preserved
CIL cil Character Information Lost
ROUGE-1 rouge_1 Unigram overlap (precision, recall, F1)
ROUGE-2 rouge_2 Bigram overlap (precision, recall, F1)
ROUGE-L rouge_l Longest Common Subsequence (precision, recall, F1)
Hits hits Matching characters
Substitutions substs Character substitutions
Deletions deletions Character deletions
Insertions insertions Character insertions

Custom error weights

k = Cthulhu(
    ["reference", "prediction"],
    insertion_cost=0.5,
    deletion_cost=0.5,
    substitution_cost=1.0,
)

Display options

k = Cthulhu(
    ["reference", "prediction"],
    percent=True,        # express rates as percentages (e.g. 17.2 instead of 0.172)
    truncate=True,       # truncate floats
    round_digits=".001", # precision
)

Text transforms

Apply normalisation steps before computing metrics. Each transform is scored individually and all together, letting you see the impact of each choice.

k = Cthulhu(
    ["Déjà 13 fois, Maxime !", "deja 13 fois maxime"],
    apply_transforms="XP",  # remove diacritics + punctuation
)

# k.scores.board contains:
# {
#   "default":           {...},  # raw scores
#   "remove_diacritics": {...},  # after X only
#   "remove_punctuation":{...},  # after P only
#   "all_transforms":    {...},  # after X + P combined
#   "Length_reference":  ...,
#   "Total_diacritics_removed_from_reference": ...,
#   ...
# }

Transform codes:

Code Name Effect
D Remove digits "1871"""
U Uppercase "texte""TEXTE"
L Lowercase "TEXTE""texte"
P Remove punctuation "Bonjour !""Bonjour "
X Remove diacritics "étaient""etaient"

Codes can be combined freely: "XP", "DLP", "XPLU", etc.

You can also use the transformation classes directly:

from cthulhu.preprocessing.transformation import (
    ToCompose,
    RemoveDiacritics,
    RemovePunctuation,
    ToLowerCase,
    RemoveNonUsefulWords,
    RemoveDigits,
    RemoveSpecificWords,
    SubRegex,
    Strip,
)

# Apply a chain of transforms to any string or list of strings
result = ToCompose(
    ["Déjà 13 fois, Maxime !", ""],
    [RemoveDiacritics(), RemovePunctuation(), ToLowerCase(), RemoveNonUsefulWords()]
)
print(result.reference)  # "deja 13 fois maxime"

Using Scorer directly

For lower-level access, use the Scorer class without the Cthulhu facade:

from cthulhu.metrics.evaluation import Scorer

scorer = Scorer(
    reference="Six semaines plus tard",
    prediction="Six semaiNEs plus tard",
    show_percent=True,
    truncate_score=True,
    round_digits=".001",
)

print(scorer.wer)
print(scorer.cer)
print(scorer.board)

Project structure

cthulhu/
├── cthulhu/
│   ├── Cthulhu.py           # Main facade class
│   ├── metrics/
│   │   ├── _base_metrics.py # Encoding helpers, rounding utilities
│   │   └── evaluation.py    # Scorer class
│   ├── parser/
│   │   ├── parser_xml.py    # ALTO / PAGE-XML parser (stdlib only)
│   │   └── parser_text.py   # Plain-text file reader
│   ├── preprocessing/
│   │   └── transformation.py # Transform classes
│   └── utils/
│       └── _utils.py        # Logging, timing decorator
├── tests/
├── datatest/
├── requirements.txt
└── setup.py

Running tests

python -m pytest tests/ -v

19 tests, ~0.06 s.


Roadmap

  • API-based HTR inference — send an image to an external model endpoint and compare the result against a ground-truth XML, without any local model

Authors & licence

Original work by Alix Chagué and Lucas Terriel (Inria). MIT Licence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cthulhu_eval-0.3.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cthulhu_eval-0.3.0-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file cthulhu_eval-0.3.0.tar.gz.

File metadata

  • Download URL: cthulhu_eval-0.3.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for cthulhu_eval-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d70168c8ec51cf0dd7bb44213fa47e0685135f9056901d5ca049475d767725ec
MD5 de2cdfcc075a9e4d3050cb1479f7cee9
BLAKE2b-256 3db8768529c67740f58a9bafa527f7708e0bbf138a03b9e961be3d7dcdc3ae40

See more details on using hashes here.

File details

Details for the file cthulhu_eval-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: cthulhu_eval-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for cthulhu_eval-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab7a477276a2d42b4cb8cf98317731b0b57c8bbb8ca7dab1d5589faa996053d9
MD5 3a518ba38328cae5cf0b017348b9cb0d
BLAKE2b-256 f2d6445f144a557c4f265750cbaa97f9d2ccc00f674bacb889a3359365864a8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page