Skip to main content

A small basic python implementation of WER (word error rate) and levenshtein

Project description

SimplePythonWER

The purpose of this repo is to provide a well tested basic python implementation of levenshein / WER so it can be shared across projects. It's based on this with a couple of minor changes.

Getting Started

  1. Install with: pip install simplepythonwer
  2. Import with: from simplepythonwer import wer
  3. Use with:
>>> wer("the cat sat on the mat", "the mat sat on the cat")
0.3333333333333333

Features

  • Simple, minimal and only in python with 0 external dependencies
  • It is versioned and can be pip installed
  • Provide examples with tests to ensure it's working correctly

Caveats and Gotchas

  • Providing an empty string or filled with whitespace ground-truth will intentionally raise a divide by zero.
  • It's possible to have greater than 100% WER if the ASR result is many times larger than the ground-truth, this is normal. It's sometimes a good idea to cap the results at a 100% with min function e.g. min(wer(ground_truth, new_asr_string), 1.0), otherwise you could be exposed to unlimited error rate that could skew your averages.

Change Log

  • v1.0.0 - First release - Minor ~15% speed improvements compared to original
  • v1.0.1 - Fixed pip packaging and added install steps. Exclude tests from pip
  • v1.0.2 - Fixed pip packaging issue
  • v1.0.3 - Fixed divide by zero error when the ground truth is zero length (including evaluates to zero length since it's just whitespace)

Tests

Run with: PYTHONPATH=$(pwd) python3 -m unittest discover . Results:

rob@rob-T480s:~/projects/SimplePythonWER (master)$ PYTHONPATH=$(pwd) python3 -m unittest discover .
..
----------------------------------------------------------------------
Ran 9 tests in 0.001s

OK

Speed Improvements

from simplepythonwer.simplepythonwer import *
import timeit
sentence = "the cat sat on the mat"*5
print(timeit.timeit('levenshtein(sentence, sentence[::-1])', number=10000, globals=globals()))
print(timeit.timeit('levenshtein_original(sentence, sentence[::-1])', number=10000, globals=globals()))
38.16882774699479
44.751817572047

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplepythonwer-1.0.3.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

simplepythonwer-1.0.3-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file simplepythonwer-1.0.3.tar.gz.

File metadata

  • Download URL: simplepythonwer-1.0.3.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/56.1.0 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for simplepythonwer-1.0.3.tar.gz
Algorithm Hash digest
SHA256 db79d1b52af523ba657270a30a8a1355261703bd217f5d73fe896481754e46ba
MD5 96cb5f71c0972636f10a412e9a447650
BLAKE2b-256 76ab1182a3e234239af88235319e3ad5f49d8ead28395c161704ec2f38977069

See more details on using hashes here.

File details

Details for the file simplepythonwer-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: simplepythonwer-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/56.1.0 requests-toolbelt/0.8.0 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for simplepythonwer-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 69d827d7c678636d2054d9768e8aa440c65b448fe6c8027bd255f640aedbba83
MD5 75211063c8588ae924e85dfd4c59f236
BLAKE2b-256 d026fdc27ef32532432b84b612f6514688c7f9c5b1254f33228baa28978efdcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page