ek (Evaluation Kit) -- a framework for building Knowledge Evaluation systems (evaluating information-extraction outputs; OCR as the noisiest special case)
Project description
ek
ek (Evaluation Kit) — a framework for building Knowledge Evaluation systems, evaluating the outputs of information-extraction systems. OCR is treated as the noisiest special case of a general problem, so the core is source-agnostic and the OCR pieces are optional.
import ek
ek.score("hello wrld", "hello world") # -> Score(value=0.0909..., metric='cer')
ek.score("hello wrld", "hello world", metric="wer").value # 0.5
ek.evaluate([("ct", "cat"), ("dg", "dog")], metric="cer").aggregate # 0.333... (global CER)
What it does
Evaluating an extraction splits along two axes — is there a gold answer
(reference-based) or not (reference-free), and are we scoring one item or a whole
corpus. ek gives you both halves through two facades over one shared typed schema:
score()/evaluate()— reference-based: compare against gold, one item or a corpus, the metric chosen by output type (string → CER/WER, record → field-F1), aggregated correctly (global error-rate accumulation, micro-F1; never a naive mean) with optional per-slice cuts.estimate_quality()— reference-free: gather signals → calibrate → validate → decide accept/flag/block, with no gold answer.
Everything swappable is a strategy injected with a smart default, so the simple call works out of the box and every layer stays replaceable.
Evaluate an OCR engine
The first concrete instance: measure OCR accuracy over a gold corpus. ek consumes
ocracy's normalized OcrResult, so it can
benchmark any of its ~16 engines — or any image -> OcrResult callable of your own.
import ek.ocr
gold = {"inv-1": {"image": "scan.png", "reference_text": "INVOICE 2024", "slice": "invoices"}}
report = ek.ocr.evaluate_ocr(
"ocrmac", gold, metric="cer", normalize=["lower", "collapse_whitespace"], persist=True,
)
report.aggregate # corpus CER
report.per_slice # CER per document slice
report.detail["per_item"] # prediction, reference, score, confidence per document
Gold corpora, results, and runs persist to local dol stores under
~/.local/share/ek/.
Install
pip install ek # lean, permissive core (dol, config2py, jiwer, rapidfuzz)
pip install "ek[ocr]" # + the ocracy OCR fleet (install engines via ocracy extras)
pip install "ek[all]" # + the permissive capability tiers (metrics, calibration, ...)
Heavier or copyleft/non-commercial libraries are never installed by default; see the
extras in pyproject.toml. Some capabilities (e.g. the cost-weighted typed-graph
metric and the ROVER consensus engine) are on the roadmap — see the tracking issue.
CLI
ek cer "hello wrld" "hello world" # character error rate
ek wer "hello wrld" "hello world" # word error rate
ek where # the local data folder
ek check tesseract # what an OCR engine needs to run
For contributors
The architecture, conventions, and the research behind the design are documented for
agents and humans in AGENTS.md, the dev skills under skills/, and
the research reports under misc/docs/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ek-0.1.8.tar.gz.
File metadata
- Download URL: ek-0.1.8.tar.gz
- Upload date:
- Size: 308.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6923d9bda6fea945ca2c8e69e91d8868e0fd8f4399947adae68910e2ef08514
|
|
| MD5 |
942a6a347263d83363686465bc7a0b6b
|
|
| BLAKE2b-256 |
6a904901d0fca1f1fae8dd5570f004a51211e84d87221ed38c17ef7b4428ad63
|
File details
Details for the file ek-0.1.8-py3-none-any.whl.
File metadata
- Download URL: ek-0.1.8-py3-none-any.whl
- Upload date:
- Size: 101.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54475f6949d94a6135d3596743553db6c7ea61d1ac92a631f3ad0383c6f92b5f
|
|
| MD5 |
f15e094d125a1878d186d1c99397dfde
|
|
| BLAKE2b-256 |
7bab17c2de5a5af184b4ba590f425a177b6a723919664d3838ffccfe603ad24f
|