Skip to main content

Text-to-text alignment algorithm for speech recognition error analysis.

Project description

ErrorAlign Logo

Python Versions Coverage Linting PyPI License


Text-to-text alignment algorithm for speech recognition error analysis. ErrorAlign helps you dig deeper into your speech recognition projects by accurately aligning each word in a reference transcript with the model-generated transcript. Unlike traditional methods, such as Levenshtein-based alignment, it is not restricted to simple one-to-one alignment, but can map a single reference word to multiple words or subwords in the model output. This enables quick and reliable identification of error patterns in rare words, names, or domain-specific terms that matter most for your application.

Update [2025-12-10]: As of version 0.1.0b5, error-align will include a word-level pass to efficiently identify unambiguous matches, along with C++ extensions to accelerate beam search and backtrace construction. The combined speedup is ~15× over the pure-Python implementation ⚡

Contents | Installation | Quickstart | Citation and Research |

Installation

pip install error-align

Quickstart

from error_align import error_align

ref = "Some things are worth noting!"
hyp = "Something worth nothing period?"

alignments = error_align(ref, hyp)

Resulting alignments:

Alignment(SUBSTITUTE: "Some"- -> "Some"),
Alignment(SUBSTITUTE: -"thing" -> "things"),
Alignment(DELETE: "are"),
Alignment(MATCH: "worth" == "worth"),
Alignment(SUBSTITUTE: "noting" -> "nothing"),
Alignment(INSERT: "period")

Citation and Research

@article{borgholt2025text,
  title={A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems},
  author={Borgholt, Lasse and Havtorn, Jakob and Igel, Christian and Maal{\o}e, Lars and Tan, Zheng-Hua},
  journal={arXiv preprint arXiv:2509.24478},
  year={2025}
}

To reproduce results from the paper:

  • Install with extra evaluation dependencies - only supported with Python 3.12:
    • pip install error-align[evaluation]
  • Clone this repository:
    • git clone https://github.com/corticph/error-align.git
  • Navigate to the evaluation directory:
    • cd error-align/evaluation
  • Transcribe a dataset for evaluation. For example:
    • python transcribe_dataset.py --model_name whisper --dataset_name commonvoice --language_code fr
  • Run evaluation script on the output file. For example:
    • python evaluate_dataset.py --transcript_file transcribed_data/whisper_commonvoice_test_fr.parquet

Notes:

  • To reproduce results on the primock57 dataset, first run: python prepare_primock57.py.
  • Use the --help flag to see all available options for transcribe_dataset.py and evaluate_dataset.py.
  • All results reported in the paper are based on the test sets.

Collaborators:



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

error_align-0.1.0b5.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

error_align-0.1.0b5-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

error_align-0.1.0b5-cp313-cp313-win32.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86

error_align-0.1.0b5-cp313-cp313-musllinux_1_2_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

error_align-0.1.0b5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

error_align-0.1.0b5-cp313-cp313-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

error_align-0.1.0b5-cp313-cp313-macosx_10_13_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

error_align-0.1.0b5-cp313-cp313-macosx_10_13_universal2.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

error_align-0.1.0b5-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

error_align-0.1.0b5-cp312-cp312-win32.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86

error_align-0.1.0b5-cp312-cp312-musllinux_1_2_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

error_align-0.1.0b5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

error_align-0.1.0b5-cp312-cp312-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

error_align-0.1.0b5-cp312-cp312-macosx_10_13_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

error_align-0.1.0b5-cp312-cp312-macosx_10_13_universal2.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

error_align-0.1.0b5-cp311-cp311-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.11Windows x86-64

error_align-0.1.0b5-cp311-cp311-win32.whl (1.0 MB view details)

Uploaded CPython 3.11Windows x86

error_align-0.1.0b5-cp311-cp311-musllinux_1_2_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

error_align-0.1.0b5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

error_align-0.1.0b5-cp311-cp311-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

error_align-0.1.0b5-cp311-cp311-macosx_10_9_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

error_align-0.1.0b5-cp311-cp311-macosx_10_9_universal2.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file error_align-0.1.0b5.tar.gz.

File metadata

  • Download URL: error_align-0.1.0b5.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for error_align-0.1.0b5.tar.gz
Algorithm Hash digest
SHA256 c859422272f8b73971069b0f580b2ce607f656385913cf7c782b9528c1c0d789
MD5 73de40876c6484c30dd3496479dd73fb
BLAKE2b-256 937d34f4c54609e6429c99fe5a002e634465d63283b63906dc1ebd808888d83c

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 d43cd8d1eb4b6dd6fd08616b21ccd94fdd94501bf2b8c0bc6208b8982d3484d7
MD5 d5ada5724bbf4d974ae5347f76e3ab2b
BLAKE2b-256 dfb2d0b86a455cee17b2d8eb9cc1d73846e4b02756dd3d9d2fcc79426e675b46

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp313-cp313-win32.whl.

File metadata

  • Download URL: error_align-0.1.0b5-cp313-cp313-win32.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for error_align-0.1.0b5-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 76a66ac0626373860710eb5afb1774a72a7d3cd000641c9eaa073d81e7780d13
MD5 d4c65b85e2e4dc3e84b41c204e0d38ac
BLAKE2b-256 d16866f85ae77a29ba699470391fae19816c90ac8064792705547af2db0dfd66

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 82e386cbe1186b635cd64711a3e04f5a4e6485cb087c6ed45d37a89c617b3351
MD5 7333d83641e244dff559c5c9c2698119
BLAKE2b-256 77c8523c62c56de2f3d87fe019e33152b8b2ffcef911157bc445c2e7d0a6f9c7

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eff112f7ffef43e17ed7a4fbc0069321a142f2dfd235b23fe0908778cec2b1d5
MD5 75de82ed37b83444a82f62a834960b1a
BLAKE2b-256 1db6bf37ca95ea400bb5192eea08dd9d8b46a5e8cd3a4331dc7c37574d3b8556

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d3b163fae62bb0985c859bfb9bcbad2ccd8845661f05a020a5cc2e4295a6e015
MD5 ced1141941e5a14b857f62fc71e97328
BLAKE2b-256 98f1f2bf83a26f9bf8373a789738a69d4ebaa57d31c0b84cfc531093914fdfd9

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 42519daf30efd16f3306823b377cea824c0529951be74a71f68c4f0ca3c3be22
MD5 7ff3e2b3670cf4988a55e586d34b9df9
BLAKE2b-256 4a93695ac8126c78eeddf4dd43fa2e2b78310153878b0639d6e288a6fc7e0bfb

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 0476461db7b3ba58ab7dc1dfcb72dc99bb99188ad8423860de0b3dc6b7fb107b
MD5 c46b5f22f64075d79040f76c1c1f1e54
BLAKE2b-256 3a41a230942eddffb8f4af581c6274458f6dbd187ed6785054d5539907ac7fbe

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d30f394b14ba39bf9c65caa7b5f35a7c21e76836cd73e1b838fa1ff2e1428b99
MD5 8f718d56efc4b338a2f83df979c5ec4a
BLAKE2b-256 6be1dae0a8922e0523a0e0ab546cde4654765c931f87f0685e1a57e878ad1191

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp312-cp312-win32.whl.

File metadata

  • Download URL: error_align-0.1.0b5-cp312-cp312-win32.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for error_align-0.1.0b5-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 c306aafe843006a56ca06a0476114d95b3d813f63c049851a047285d4500796c
MD5 451feefc78a611f68da14ab391915e21
BLAKE2b-256 25fa626f69af7e58a69e8936be34c9c1aa812bf4f35ff98549b5379e6668be63

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 5fd530d82b25b6d17e4f121256652895937f3846ffb922736682e849e67a6829
MD5 c9c1d0ed02d513f62f6358babbb9bfcf
BLAKE2b-256 f38b18277c4ccae5a646f1a3b3ee153a130b993c4bd25bf91524ebb17f7cb8b0

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0adc42542ab7f2a02b74012aa96a2ad66519e0ae3be943b69828769f5e33d299
MD5 62d6033d668998cf061184212ef178c0
BLAKE2b-256 9896b37f6612666b44c6b4cde742b4ea58caf599bf34aae036a55f1bb8f85615

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 454a61c1d54659e9438bd319fba875fd4ce4f0999cea175680f0a6f6c5a93212
MD5 42b35b190d8b7eee7a6943ae7f4202ac
BLAKE2b-256 8237bb531f9ef727b043f5fbf2bdf52ca73f6a1de05f286a310e24066553c515

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 c5324c3f7a2a240e31720a38a6016f1d8745e3044f210e5e5ee09f0af9845ed3
MD5 b347b37829d57454bf221da2ac8b11f2
BLAKE2b-256 a01a8adbcd94cceb341c15dda9534b442d84383f408e54682be0c0c415eb8462

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 ff922e860b6bc2c146c2eb499253a0193c5eea0661ba1764486f4d5301803eb2
MD5 a1039a3bb3324fd964a9bbd3e66329c9
BLAKE2b-256 f867459d8da54805206c7cea33b5a8d080fba1e2dd287192a20e045cb8ba794e

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 24255536eeeac37e5f8952c56344b378b535771642f3d5776f988162ef9d6ad6
MD5 a516a929730d577abf68a65e230a47d9
BLAKE2b-256 0aaf4e54302a6c9cc69c304b2d8f9fd0b2a2219eacc845e31be524d0e4fea5a1

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp311-cp311-win32.whl.

File metadata

  • Download URL: error_align-0.1.0b5-cp311-cp311-win32.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for error_align-0.1.0b5-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 5ace6b6aa0fe17a853dabddec398813148d14bbd0c4f32636810b77d0ab110df
MD5 e3bf085dfea9ef5aa86c15cfddfeff93
BLAKE2b-256 defe486cf47d93f4af75c0bfbce3333f880d27dee51cf9c08656e0adeed8638f

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 48b569820265468c8603da130c4c75733cf5c1b1a2aa65b36e92bd01bea9bc6a
MD5 b86f640778815fbe8f438d8d505f646d
BLAKE2b-256 3fd75bebd17f96803d8f808c2dd58101f90333cf4f02c7b7088931cd18ffc656

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5143a05fa2434340c01547a2337782ccecf569b8bf1c31e89f09d6164cd75ac1
MD5 5d056a2aa5a532cff9a43e1a121e0e7e
BLAKE2b-256 30bf0284cff3770b587d5b090d29b443bbe5caadede42a1d8f1e2b447832a990

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2d89b581e14e328efdc2df8dcd766b7c3b45d2dee53336561c7c4551502f2cf1
MD5 6273be8f00e082d0988b798488755010
BLAKE2b-256 65157eb2df113e5c945338ee78fa217aa89e314f7d6ef9c56b82241ab1df9827

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a8a07d5efe63f9a270dbfe75c93b280266132a5d3b6657f589b84e6f10c54279
MD5 5aaf212a1bbd04126860764c4063e3b2
BLAKE2b-256 c4af3e9f4bdc6e3a42fc831ba36f8d79345574ea22ca6d99ffc9a054189363b4

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b5-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b5-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 303d4be6a72e1132b3fc47bb3103eb4fb64cd0b09c8f2d5e3debd6372c50a395
MD5 8196298a89b3b639006e02fb00200f66
BLAKE2b-256 54c5218620b4d839a45dfb46855ee16fdeeeabde66375e3c8b1c04fc59659cc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page