Skip to main content

A package containing generic error-rate functions implemented in Rust

Project description

Universal Edit Distance

Universal Edit Distance (or UED)(sometimes called Universal Error Rate because I struggle to be consistent) is a project aimed at creating a simple Python evaluation library in Rust.

The universal part of the name comes from the fact that the Rust implementation is generic and works on any data type that implements PartialEq as opposed to most implementations that are limited to only strings.

✨ Features

  • Much quicker than HuggingFace's evaluate library (see benchmarks below)
  • Word-error-rate and Character-error-rate functions compatible and comparable with evaluate's wer and cer metrics.
  • Functions that return the wer or cer for every test as an array within a fraction of second.
  • Functions that return the edit distance for every test as an array of integers within the fraction of a second.
  • Generic implementations of the mean-error-rate and error-rate metrics that can work with any* Python type
  • Includes type-hints to make development easier.

* I am pretty sure it works with any type, but it is still being tested

⚡️ Quick start

Since the library is still very much being developed and, while it works for my purposes, isn't tested very well, I haven't pushed it to PyPI yet. As such you have to install this Git repo directly like so:

Using pip

pip install git+https://gitlab.com/prebens-phd-adventures/universal-edit-distance

Using uv

uv add git+https://gitlab.com/prebens-phd-adventures/universal-edit-distance

Note: cargo needs to be installed in the environment for you to be able to compile the library.

You should now be able to import the module universal_edit_distance in your Python project.

🎯 Motivation and why this project exists

I love statistics, and I when I evaluate my speech-recognition models (and other models) I like to run t-tests etc. However, doing that with HuggingFace's evaluate library while possible is horrendously slow.

If you only require the mean CER or WER you could continue using evaluate and your life would be fine. If you want to be more rigorous in your testing and evaluation, you should consider using this library.

In addition, one thing that annoys me with a lot of Levenshtein implementations is that the algorithm can literally work on any data type that supports comparison. I have tried to make the implementation found here as generic as possible.

Benchmarks

You can find the benchmarking script here: prebens-phd-adventures/ued-benchmarks

Note that the single floating point result normally returned from evaluate is in this library and in these results called the mean-error-rate since it is effectively the mean across all tests as opposed to only a single test. The tests returning a floating point result for each row in the test case is simply called error-rate.

The tests in the table below were run using evaluate=0.4.3, jiwer=3.1.0, and universal-edit-distance 0.2.0 on a Polars DataFrame containing $n=12775$ entries. For the mean-error-rate results the tests were run 100 times per, and for the error-rate results they were only run once due to evaluate being too slow.

Metric evaluate jiwer ued Speed-up vs evaluate Speed-up vs jiwer
Mean WER 0.31s 0.16s 0.02s 15.28x 7.75x
Mean CER 0.45s 0.24s 0.09s 5.01x 2.60x
WER 24.77s 0.27s 0.02s 1137.30x 12.61x
CER 25.34s 0.37s 0.09s 278.97x 4.03x

As can be seen in the table, ued beats evaluate and jiwer in basically every metric. The goal of the project was to make WER and CER faster, but I'll take the w for the other two. What you'll also notice is that the results for the mean-error-rates and error-rates are the same for ued. That is due to the way it is implemented and is expected.

👩‍💻👨‍💻 Contribute to the project

This is my first ever Rust project, so I while I have a vague idea about what I am doing, I am sure it can be improved. If you have any suggestions or requests please feel free to add an issue!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

universal_edit_distance-0.3.2.tar.gz (2.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

universal_edit_distance-0.3.2-cp312-cp312-musllinux_1_2_x86_64.whl (231.6 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

universal_edit_distance-0.3.2-cp312-cp312-musllinux_1_2_aarch64.whl (218.5 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ ARM64

universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (233.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl (250.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ i686

universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (221.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

universal_edit_distance-0.3.2-cp312-cp312-macosx_10_12_x86_64.whl (684.6 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

universal_edit_distance-0.3.2-cp311-cp311-musllinux_1_2_x86_64.whl (232.5 kB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

universal_edit_distance-0.3.2-cp311-cp311-musllinux_1_2_aarch64.whl (219.2 kB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ ARM64

universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (233.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (251.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ i686

universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (222.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

universal_edit_distance-0.3.2-cp311-cp311-macosx_10_12_x86_64.whl (688.4 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

universal_edit_distance-0.3.2-cp310-cp310-musllinux_1_2_x86_64.whl (232.5 kB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

universal_edit_distance-0.3.2-cp310-cp310-musllinux_1_2_aarch64.whl (219.2 kB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ ARM64

universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (233.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (250.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ i686

universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (222.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

universal_edit_distance-0.3.2-cp310-cp310-macosx_10_12_x86_64.whl (688.2 kB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

universal_edit_distance-0.3.2-cp39-cp39-musllinux_1_2_x86_64.whl (232.9 kB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

universal_edit_distance-0.3.2-cp39-cp39-musllinux_1_2_aarch64.whl (219.2 kB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ ARM64

universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (251.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (250.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ i686

universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (222.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

universal_edit_distance-0.3.2-cp39-cp39-macosx_10_12_x86_64.whl (689.0 kB view details)

Uploaded CPython 3.9macOS 10.12+ x86-64

File details

Details for the file universal_edit_distance-0.3.2.tar.gz.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2.tar.gz
Algorithm Hash digest
SHA256 86f71fe3eb3a4a9ee48d86bebd35b0811d923680d09e3c71bab0e1a665d755dc
MD5 459b32b64d773bd435dfcf8aacab71a4
BLAKE2b-256 7732555d450cb180b96a449e87ee44ad3dfa2a5bbcc3bf46fd2bff2c045309d7

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 025d938ade282f4ad72843ac0913513a95f63f2fdbd38b812bb0c8eaff169def
MD5 1e76ac18b9ed034bdeeffe2be2f8f17f
BLAKE2b-256 aefdd53f9f85a67864d37ba050d7a545354a60d2db12c44ef621139e78e579bf

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp312-cp312-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 d23a0c6fc438ae9f4b8332760753f84c9cd777169e5680afae9941ea11e797ee
MD5 9e2b95d215797d0311252ef3f91ae2a9
BLAKE2b-256 3358f18269b531820a7271baf8ded62ea196d81b045227f52fa08d7d627584b9

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d6423ce5f5664f0cde1f5485d1ff0039a88f3251344d770c58a9170a8d1edf24
MD5 853103ceb906271c0138565991b38283
BLAKE2b-256 a0d36087d26c864f08f7e5470f87346e178f778da6156d8213e8a4aff4a6a6e0

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 b08971e3a2eb766c0e3ba80cc53940e7c78f5e176a19cb8ef647b92de5e65246
MD5 3040c2edcebcdf70c767e4f01d3299cd
BLAKE2b-256 6d5b19ef4e5d7c04298a71b48e767c6d67d7faf05bee1c337dab3cd4fafc8065

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 70b5f5dccbc7dbd7c3cfdf8415917a9afb0343d713f1b9d47d683bc9a2f8513e
MD5 2330a9e264be11990043c97b1bcf5f66
BLAKE2b-256 26797b7cad717506f86d5c31492452f0addcc4e95a9139fd3e868abb9b7ac615

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f2ecf3b6c37c32ec91174a61e9cac5878f06c8574d4c44c4b505e84fa7163ccd
MD5 6b80ec0c02eed9ce08c7203f7944565f
BLAKE2b-256 2da1d0f2fbaa3b76ae2365fa0e120c26be9389e7a3803e7573d2213cbad16071

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 273460cc8ba8b6bbff151a677e9bde328b4ae43378d9f4eeb12a4a0280072c23
MD5 f56369ed5833514ae8165410d4c984b0
BLAKE2b-256 c3b20c8839d3d4fdf99b8064f7b7313c3397285422fe1ee0ce34b2f5847bb601

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp311-cp311-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 b65838acd699e4a5df1175ced854842c9ae707efa0bf4740f4233e7f9e9d5e17
MD5 694da8df786abd32051db8e68897717b
BLAKE2b-256 f5bb09f3495c101b4530e24b2ff72642fe0f83466c9b95d6603abee85f7ed5f9

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ebdaec67704965c3d2b701451388893dd0d4077cc63c75b5e6af8e6a43140e58
MD5 8718dea0d5791588d1c1d94f76cb4f55
BLAKE2b-256 70b2c98cb3c3539d896cf671457b5ccf712d049c805acc484e56cb10c728c40a

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 cfbde1ef8a540323ae8329d4ca66c7b1b60874174fe413ea1c86d696294cdbf3
MD5 d89bea4ae36d659475f8c4b46f23238b
BLAKE2b-256 38646ad2dd100d115838baaa644e2b18c7b0364c44563f7d5ad71f425a1d2442

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4159847b6a9243709fcae44a646950471eb37728efb2dc0d9d0370c1c5ac5187
MD5 9e781b5497b3218bf4509420fbfab4dc
BLAKE2b-256 4540641ba1b940bd44587341fa314c1d1f0fa9f0d1602e5583cfd8247096e8a9

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 50dfd7c32bba5f7742584c2ec270b2d65840e9350bf6dee12b5829aa0362503b
MD5 b1167edcbd3459898aeabf857db78bf5
BLAKE2b-256 0047b75eb5176b6825d66e650a4e2a91e3a23270d57b663b51cdae660f8b0a5f

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 87e425844c86d976d4a4daaf9513e6d27b6a207d03ec55a3d023c98b37b606dc
MD5 bb971185694f9ae42a8b70f4b99f88a9
BLAKE2b-256 a9d44a3ef8a13dfc2f019c49a23a5120efd1af8b0285daafb05154cba0288466

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp310-cp310-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 195b4c22b239c957fc0c5605e050284596382570c0875c8a38b9f7609c6b21d4
MD5 e4210b6df7baae3e24e2c8e7a49cb356
BLAKE2b-256 f0580b0ca5a709d4129b1e01bb9e7e1e8d84601021c6a8e831bf025a763bf19f

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f99099a1c86718d46c7978830cca0474ebc6dd7162bf7c4403e640e7d46adeef
MD5 84aef1631df1c4170f1d5fab97102c21
BLAKE2b-256 3e5befc76cbf3e7ebcf9d1d98494e3051770a9dde02ff3cf0b915b3031165282

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 6cce64623b00a789737e9b91f6fdf5bf8a9f5bfe1c1f26bc2d186bd522938653
MD5 838cd2ffa18735cd31b17dd2cae875ea
BLAKE2b-256 8a8e25c37858e72091694e1c227043f80c3caff0ce935aed23d44c493da85f43

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3bfa60337c23809ad2e2817278c567026c948e47538c5742e9b2655f26ffc6b6
MD5 771a678fac8762fc1cf21d56db5d5fc9
BLAKE2b-256 8566ae56ceee9baad4288c1d8f7afd1a6b7751a2b04c1e2721631a2735a0e6d4

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 319fbc963397cf39c41419044c12ec679ead90c1514e585f22e08c48eb0592f1
MD5 14eec88a6ac1ae9dba7d03741d08448e
BLAKE2b-256 87170a1093406ae6f14f896d9e34430b45c0ec22107fb47777c61a5683a15d00

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 dd5bc535e40fa05e32518b9342c8d7c3aec5c2c89da1dbd6830d3dd108b92ac9
MD5 5d32bf617a85dbaf25b55200cbc11645
BLAKE2b-256 6b15a8d19aaf32e49eebf1cf99fbc8072f5b98028fd30551ee06a1e64e47901b

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp39-cp39-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp39-cp39-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 3a12b1767347798bb2b30d882ce9e728bc580a2e05fbff3da2e59a7e0c5ea111
MD5 e82f3f942481add27f767bc54c30184e
BLAKE2b-256 9606ada17eb87af4a1d82e0644ae44c217a0e4e0fc7246e9cc6e527c91542cfe

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ff597e5f19cae3efff09f058b51307c4d7706ff5df91d8eda5c6e0bb864ce416
MD5 c8c41bdd4c086484d2b4869146c0d0b3
BLAKE2b-256 791b36e6a9f608fcb758ef08627e957ab83043b6b50d4f498ef83ad957da06a4

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 d3edd1381f9b5455ab49fd177198b9f003910cb608f522c9d40cfad3b987f2c1
MD5 e9192872030b75ea9e82714705f16a5f
BLAKE2b-256 0ad51e81840a5093cb5bc5de25501e3b3fd02a60b331b2d8c149676d2ea24d35

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 75320cc6ddf58ab57a35f3092cc475ee6f28a01ac267ecbf62ecb651b5e5ac59
MD5 681b44532c5e12340676b76ec38d84db
BLAKE2b-256 c8d3296226af815fbd92cff0110e7e5f42ac0cfefaff649e5c86e131ed9846bf

See more details on using hashes here.

File details

Details for the file universal_edit_distance-0.3.2-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for universal_edit_distance-0.3.2-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 753589c8a6e42d4fa863a7b48bf311d3ea4963862eae1d89040f557bda780570
MD5 7a9614435ebbb68a2962d5ce734928a2
BLAKE2b-256 4e961d5242dbe69b1dd85f8d2b8c9c91be71479848827a32509b719f11991ce8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page