Skip to main content

Text-to-text alignment algorithm for speech recognition error analysis.

Project description

ErrorAlign Logo

Python Versions Coverage PyPI License


Text-to-text alignment algorithm for speech recognition error analysis. ErrorAlign helps you dig deeper into your speech recognition projects by accurately aligning each word in a reference transcript with the model-generated transcript. Unlike traditional methods, such as Levenshtein-based alignment, it is not restricted to simple one-to-one alignment, but can map a single reference word to multiple words or subwords in the model output. This enables quick and reliable identification of error patterns in rare words, names, or domain-specific terms that matter most for your application.

Update [2025-12-10]: As of version 0.1.0b5, error-align will include a word-level pass to efficiently identify unambiguous matches, along with C++ extensions to accelerate beam search and backtrace construction. The combined speedup is ~15× over the pure-Python implementation ⚡

Contents | Installation | Quickstart | Citation and Research |

Installation

pip install error-align

Quickstart

from error_align import error_align

ref = "Some things are worth noting!"
hyp = "Something worth nothing period?"

alignments = error_align(ref, hyp)

Resulting alignments:

Alignment(SUBSTITUTE: "Some"- -> "Some"),
Alignment(SUBSTITUTE: -"thing" -> "things"),
Alignment(DELETE: "are"),
Alignment(MATCH: "worth" == "worth"),
Alignment(SUBSTITUTE: "noting" -> "nothing"),
Alignment(INSERT: "period")

Citation and Research

@article{borgholt2025text,
  title={A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems},
  author={Borgholt, Lasse and Havtorn, Jakob and Igel, Christian and Maal{\o}e, Lars and Tan, Zheng-Hua},
  journal={arXiv preprint arXiv:2509.24478},
  year={2025}
}

To reproduce results from the paper:

  • Install with extra evaluation dependencies - only supported with Python 3.12:
    • pip install error-align[evaluation]
  • Clone this repository:
    • git clone https://github.com/corticph/error-align.git
  • Navigate to the evaluation directory:
    • cd error-align/evaluation
  • Transcribe a dataset for evaluation. For example:
    • python transcribe_dataset.py --model_name whisper --dataset_name commonvoice --language_code fr
  • Run evaluation script on the output file. For example:
    • python evaluate_dataset.py --transcript_file transcribed_data/whisper_commonvoice_test_fr.parquet

Notes:

  • To reproduce results on the primock57 dataset, first run: python prepare_primock57.py.
  • Use the --help flag to see all available options for transcribe_dataset.py and evaluate_dataset.py.
  • All results reported in the paper are based on the test sets.

Collaborators:



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

error_align-0.1.0b6.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

error_align-0.1.0b6-cp314-cp314-musllinux_1_2_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ x86-64

error_align-0.1.0b6-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

error_align-0.1.0b6-cp314-cp314-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

error_align-0.1.0b6-cp314-cp314-macosx_10_15_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.14macOS 10.15+ x86-64

error_align-0.1.0b6-cp314-cp314-macosx_10_15_universal2.whl (1.2 MB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

error_align-0.1.0b6-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

error_align-0.1.0b6-cp313-cp313-win32.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86

error_align-0.1.0b6-cp313-cp313-musllinux_1_2_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

error_align-0.1.0b6-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

error_align-0.1.0b6-cp313-cp313-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

error_align-0.1.0b6-cp313-cp313-macosx_10_13_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

error_align-0.1.0b6-cp313-cp313-macosx_10_13_universal2.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

error_align-0.1.0b6-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

error_align-0.1.0b6-cp312-cp312-win32.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86

error_align-0.1.0b6-cp312-cp312-musllinux_1_2_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

error_align-0.1.0b6-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

error_align-0.1.0b6-cp312-cp312-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

error_align-0.1.0b6-cp312-cp312-macosx_10_13_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

error_align-0.1.0b6-cp312-cp312-macosx_10_13_universal2.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

error_align-0.1.0b6-cp311-cp311-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.11Windows x86-64

error_align-0.1.0b6-cp311-cp311-win32.whl (1.0 MB view details)

Uploaded CPython 3.11Windows x86

error_align-0.1.0b6-cp311-cp311-musllinux_1_2_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

error_align-0.1.0b6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

error_align-0.1.0b6-cp311-cp311-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

error_align-0.1.0b6-cp311-cp311-macosx_10_9_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

error_align-0.1.0b6-cp311-cp311-macosx_10_9_universal2.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file error_align-0.1.0b6.tar.gz.

File metadata

  • Download URL: error_align-0.1.0b6.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for error_align-0.1.0b6.tar.gz
Algorithm Hash digest
SHA256 e919765db17c45927da582fb0d39bc94867b60abd180b3ce1408ad6578dce98a
MD5 dbc717887fcd5fb8b7d818deb403709f
BLAKE2b-256 9222c1629f47223789dc760fbaab70da11f205df1a5ebbfcf53120b001bfc294

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp314-cp314-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp314-cp314-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 2e8e1b852bd0121e65d15355b4ddf930eec9786e3a447b9fab8191888065ecee
MD5 22964f9e8dd70db65c67ccaf9df86fee
BLAKE2b-256 16da6964fd262ae02234ecaad2f75794b0b827489ed9e212f4a95e63741746d7

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 94d185cb81a4ad215f45e1c1589d57cc33809047272b38851c169e4bf7c640c0
MD5 904c35d965bc5fef2f0d1a448d64f178
BLAKE2b-256 44405296c8310e2d845e5a2c2411d25930d2b8b2a21c953e29701c6f42d1754d

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fa141e650148c1173ba09859de8e8ee3623979c1cb8682d0708b1918e5473b1f
MD5 3d34c8ac6acbe72f677e273def6f9cfe
BLAKE2b-256 2dbf1265f7ed41821b101889427539982894d3045e079e6bfa0197384131a306

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp314-cp314-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp314-cp314-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 75b3a7a8d189774d514e54d9d1bad54d58646f416a863566ed6cafdef2c3ca42
MD5 c11375cfb012dd54b1f5ebd27c892de9
BLAKE2b-256 16ba71862d10ff1130003ec4561105b5e3a02df5a64822332e06317e19609b3d

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 f3f879ba762d20d37e7d0c81b223a11df33fce257cb4ad806b044e2d0202375c
MD5 1bbe75d534a37b334be49ae3bb5b7c54
BLAKE2b-256 940ed068a698b536222b153c098abb7080161f7b73801070951830403659b45a

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b3b62ca0fd7e8f6d2a31ae1772b2769fc6e8ffc2d67e07ee3f2f5ddfc1c49dfd
MD5 0cd75fe5cd1d562a31e93bffbbd66b59
BLAKE2b-256 5370045c853703fa9bff01cd38cad903aeb93542cb9cd429e92cfdec2767d05d

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp313-cp313-win32.whl.

File metadata

  • Download URL: error_align-0.1.0b6-cp313-cp313-win32.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for error_align-0.1.0b6-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 d597051ef5067ad1b03d113fe0e018ae47602038d0f6966429989002793374cb
MD5 e5e5a3ce4382770bc9fd8129bc1a5620
BLAKE2b-256 bdc72bc29f95cb647e024c7d4ea28a13a4c80ef1696f4ea1bcbaa33488b4140a

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 ebb6969a6e7f1eeb2a0b71957adbdbecc4aae7c762d9c5619e9dbeb75c3bb48f
MD5 06cf5f49958ab02a2a7223482ad3563f
BLAKE2b-256 5ff3f841301576ddbcd90e9077fa8e9e68e88405159c09f1138235ea4e98e9fe

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 6fa6acf5a31da349987fd1f70e12ec66faef5c79ce2fc0e32da6ffdae6e98e8a
MD5 4c96641e987041bdbe5c9a2120846887
BLAKE2b-256 aba04c5b3505dcdfc9f236b4b6e5758ec0ec3859287b457bed330c206f4884b1

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9ec9ac7b1540a6244b0515274ff9a91e257df77acfe9a63fdff3ce530088b2d0
MD5 a6a4a464152b6a962ea28ebc552f6445
BLAKE2b-256 8f4086ce5f04a9c670ab1c1d9c840ffb0011c805e4b24f1a750d76cc419f28f2

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 262e7849e4d5cb9864f0a2d61d4ff17d126c876cb90e2773a1ceda98a5763b98
MD5 84b3a658ccb27892bc58bcd1e0086901
BLAKE2b-256 aaf47a8981e783ed1dc0a63128a97f6866149ff205a7258d21d465c29554cf98

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 26f2b3e1a20a22dfd2d0b83dc7800feb02dfed31c66cf0b297ea02624bb019d7
MD5 64069ddb41e8959970937154bdbb31bf
BLAKE2b-256 4aab903365d732772d21fdabf36f4e529a9f34a775255cc24eefcd39385c6c89

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 46093472ba95625953bff064c5e06027089bfd53d8d5d772c99101e73bf35055
MD5 b469bc7ccedcaac135e7c8cd7ab159d6
BLAKE2b-256 fc09c4bd73a4767f8898288d8246e8d7fdda9352af25e3f70f2516e344927688

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp312-cp312-win32.whl.

File metadata

  • Download URL: error_align-0.1.0b6-cp312-cp312-win32.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for error_align-0.1.0b6-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 41ac3a9638e078c99d68f80014d14cfe7108596fa2a5f04b3de5cccfb1e1302d
MD5 fbcf7d5033b9ad72c7392f379bd6ccc2
BLAKE2b-256 f669f736c029ebeae88c2ec1b63ba7900ee7dcaa130a448183318098ea29abc7

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 040fd80ee586e829b0a6ab15660171f37dea550ada604b0eb069b8d93234ff25
MD5 ec52cb7c289e66268bf4521941ad077c
BLAKE2b-256 abeef2d8c9ef6d724a3d173427c1a84ded5d87738a36929caeb28e3544ea826b

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 3da9607b36f2b2d435105546e87162de900f90ade1abeeca65c949675b0b9ce0
MD5 0edda252b1af6fa7bf4ee05d5456ea25
BLAKE2b-256 8e705417dfa151fac6a4a2e17762e380ea93e96f56e0889cb6163b186f736bbb

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 94e66b68ef1cc4c0414121af564f529847b396aec4b595741c69e292acdd925c
MD5 ccded52cb853a855275d3501b934018c
BLAKE2b-256 7f1b07e24834dab1631f447d4ddcb03c92ef0f950c4ecf6c3f139954504d787a

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 5198110f5ecbbe4585df4105e18c6166a5d67a5a36246ed8fdc7f3f2eccf253d
MD5 1ed6d6e67469e618fc9818f429f3b368
BLAKE2b-256 cc09c47e879e69f97a6324fcde0f38320cc0ce8c4241d2b7c74097226939e367

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 c9d1832d2b71305db75dfd3b46d91d6325dacdd55f39b39bfc24ea1531b3159b
MD5 a97dd19bb0feff9726413f835f7d0621
BLAKE2b-256 a08acf615a9d5bfd1e9930421e12f1f26fe559727f10267fd29b5c3620447a2a

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 25748d766d323b826c2ef2e6f3e57aa0540c702a15f7d3427d3ad9f1bb694ae8
MD5 5be5e0e46ab7e0badcc4d61dbacc3957
BLAKE2b-256 d39f907d453f30a7ffaa2ad78985f7dee4decf2f9ef8f8bb976d4d76b83a69ea

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp311-cp311-win32.whl.

File metadata

  • Download URL: error_align-0.1.0b6-cp311-cp311-win32.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for error_align-0.1.0b6-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 00bd3cc071b5fd413197235ae2a35e6fef44ffa34cf32798e430802bc3cff87b
MD5 4751c9efdba7dcfbf93be7072cc4f4b4
BLAKE2b-256 0c2505a306dc06ee422527af62d29831a81d0325b5e4356814a29603de288595

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 cd847fb3735d18de1dcf41ba3b61aa0b98deae5805298029023dd628b1751c14
MD5 88a52c4587078adb3dae59d95858ede6
BLAKE2b-256 2d549501a8bb03aa19f5f462eab749d4fe932629af1eb28ff020d5e10d1bfc2c

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 24707e4355e7a987c8a1d2d6708e39a74bccabe77dbd418004f1df29baaa7312
MD5 e837660d66b10c4ff87ea1feab5ae7d6
BLAKE2b-256 d25c26de8b1db76727d18202ddbd097c71b04995132876b7cbe8a569b72b6184

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4363c797a483f9018cfb52f829a9465c533fc80061f182fca61e4fe3c29b1ac2
MD5 112a3445dbbfa7584b661345bc5d6cf5
BLAKE2b-256 c33ddd0c80e61f0782b4eee607ea8353785573b28514379fdc4fc8e581ada81e

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6ffc999a499849348c1732499ebe45b88636170374b6bc57b8bd2785523725c9
MD5 b487a0cd447b6309e03557687255a921
BLAKE2b-256 a3a26be02b259bf5193b96ff75613f3da50fdee3bcd18fce864311aea4159ea0

See more details on using hashes here.

File details

Details for the file error_align-0.1.0b6-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for error_align-0.1.0b6-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 591143e59f94c6f3c7b52e71ad33d9dec8c2181ac17a96450ba0996f57c0e5bf
MD5 1903f459cabe8df3838bccce0497c2a6
BLAKE2b-256 877fcc51263dc285902df42553af5783238d402df9899739831d725fdcdf5e3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page