error-align

Text-to-text alignment algorithm for speech recognition error analysis.

These details have not been verified by PyPI

Project description

ErrorAlign Logo

Python Versions Coverage Linting License

Text-to-text alignment algorithm for speech recognition error analysis. ErrorAlign helps you dig deeper into your speech recognition projects by accurately aligning each word in a reference transcript with the model-generated transcript. Unlike traditional methods, such as Levenshtein-based alignment, it is not restricted to simple one-to-one alignment, but can map a single reference word to multiple words or subwords in the model output. This enables quick and reliable identification of error patterns in rare words, names, or domain-specific terms that matter most for your application.

Installation

pip install error-align

Quickstart

from error_align import error_align

ref = "Some things are worth noting!"
hyp = "Something worth nothing period?"

alignments = error_align(ref, hyp)

Resulting alignments:

Alignment(SUBSTITUTE: "Some" -> "Some"-),
Alignment(SUBSTITUTE: "things" -> -"thing"),
Alignment(DELETE: "are"),
Alignment(MATCH: "worth" == "worth"),
Alignment(SUBSTITUTE: "noting" -> "nothing"),
Alignment(INSERT: "period")

Work-in-Progress

Optimization for longform text.
Efficient word-level first-pass.
C++ version with Python bindings.

Citation and Research

@article{borgholt2021alignment,
  title={A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems},
  author={Borgholt, Lasse and Havtorn, Jakob and Igel, Christian and Maal{\o}e, Lars and Tan, Zheng-Hua},
  journal={arXiv preprint arXiv:2509.24478},
  year={2025}
}

To reproduce results from the paper:

Install with extra evaluation dependencies:
- pip install error-align[evaluation]
Clone this repository:
- git clone https://github.com/borgholt/error-align.git
Navigate to the evaluation directory:
- cd error-align/evaluation
Transcribe a dataset for evaluation. For example:
- python transcribe_dataset.py --model_name whisper --dataset_name commonvoice --language_code fr
Run evaluation script on the output file. For example:
- python evaluate_dataset.py --transcript_file transcribed_data/whisper_commonvoice_test_fr.parquet

Notes:

To reproduce results on the primock57 dataset, first run: python prepare_primock57.py.
Use the --help flag to see all available options for transcribe_dataset.py and evaluate_dataset.py.
All results reported in the paper are based on the test sets.

Collaborators:

Pioneer Centre for Artificial Intelligence

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.0b9 pre-release

Apr 6, 2026

0.1.0b8 pre-release

Jan 23, 2026

0.1.0b7 pre-release

Jan 19, 2026

0.1.0b6 pre-release

Dec 15, 2025

0.1.0b5 pre-release

Dec 10, 2025

0.1.0b3 pre-release

Oct 15, 2025

0.1.0b2 pre-release

Oct 2, 2025

This version

0.1.0b1 pre-release

Oct 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

error_align-0.1.0b1.tar.gz (38.7 kB view details)

Uploaded Oct 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

error_align-0.1.0b1-py3-none-any.whl (45.8 kB view details)

Uploaded Oct 1, 2025 Python 3

File details

Details for the file error_align-0.1.0b1.tar.gz.

File metadata

Download URL: error_align-0.1.0b1.tar.gz
Upload date: Oct 1, 2025
Size: 38.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for error_align-0.1.0b1.tar.gz
Algorithm	Hash digest
SHA256	`db76c661b2602b9e02d59c9cccc76d8cbf8c5367e950523fe72fb2de9950bc68`
MD5	`5b6f9d70ee3048781475ae169e69f247`
BLAKE2b-256	`233e6d803bc0f405b0c4c47278d8069e2bc243b527bf152e6393e06d8cd19f7f`

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b1-py3-none-any.whl.

File metadata

Download URL: error_align-0.1.0b1-py3-none-any.whl
Upload date: Oct 1, 2025
Size: 45.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for error_align-0.1.0b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5984c7a21f54eaf19c5c556a1e6cb2a9feae00bebcd0cdb8842e9d13143a9489`
MD5	`df97af616df9ea576efa098c23a89411`
BLAKE2b-256	`a91a839d397a4b1a326cb8191943ec683d019800e24ac991c0feb5d1a39e454a`

See more details on using hashes here.

error-align 0.1.0b1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Installation

Quickstart

Work-in-Progress

Citation and Research

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes