Approximate the WER of an ASR transcript

Project description

Word Error Rate for automatic speech recognition

This repository contains a simple python package to approximate the WER of a transcript. It computes the minimum-edit distance between the ground-truth sentence and the hypothesis sentence of a speech-to-text API. The minimum-edit distance is calculated using the Wagner-Fisher algorithm. Because this algorithm computes the character-level minimum-edit distance, every word in a sentence is assigned a unique integer, and the edit-distance is computed over a string of integers.

Installation

You should be able to install this package using pip:

$ pip install jiwer

Usage

The most simple use-case is computing the edit distance between two strings:

from jiwer import wer

ground_truth = "hello world"
hypothesis = "hello duck"

error = wer(ground_truth, hypothesis)

You can also compute the WER over multiple sentences:

from jiwer import wer

ground_truth = ["hello world", "i like monthy python"]
hypothesis = ["hello duck", "i like python"]

error = wer(ground_truth, hypothesis)

When the amount of ground-truth sentences and hypothesis sentences differ, a minimum alignment is done over the merged sentence:

ground_truth = ["hello world", "i like monthy python", "what do you mean, african or european swallow?"]
hypothesis = ["hello", "i like", "python", "what you mean swallow"]

# is equivelent to

ground_truth = "hello world i like monhty python what do you mean african or european swallow"
hypothesis = "hello i like python what you mean swallow"

Additional preprocessing

Some additional preprocessing can be done on the input. By default, whitespace is removed, everything is set to lower-case, . and , are removed, everything between [] and <> (common for Kaldi models) is removed and each word is tokenized by splitting by one or more spaces. Additionally, common abbreviations, such as won't, let's,n't will be expanded if standardize=True is passed along the wer method.

from jiwer import wer

ground_truth = "he's my neminis"
hypothesis = "he is my <unk> [laughter]"

wer(ground_truth, hypothesis, standardize=True)

# is equivelent to 

ground_truth = "he is my neminis"
hypothesis = "he is my"

wer(ground_truth, hypothesis)

Also, there is an option give a list of words to remove from the transcription, such as "yhe", or "so".

from jiwer import wer

ground_truth = "yhe about that bug"
hypothesis = "yeah about that bug"

wer(ground_truth, hypothesis, words_to_filter=["yhe", "yeah"])

# is equivelent to 

ground_truth = "about that bug"
hypothesis = "about that bug"

wer(ground_truth, hypothesis)

Project details

Release history Release notifications | RSS feed

4.0.0

Jun 19, 2025

3.1.0

Jan 31, 2025

3.0.5

Nov 1, 2024

3.0.4

May 5, 2024

3.0.3

Aug 31, 2023

3.0.2

Jun 16, 2023

3.0.1

Mar 28, 2023

3.0.0

Mar 17, 2023

2.6.0

Mar 14, 2023

2.5.2

Mar 13, 2023

2.5.1

Sep 6, 2022

2.5.0

Sep 3, 2022

2.4.0

Sep 2, 2022

2.3.0

Nov 10, 2021

2.2.1

Oct 27, 2021

2.2.0

Nov 18, 2020

2.1.0

Apr 22, 2020

2.0.1

Apr 2, 2020

2.0.0

Mar 22, 2020

This version

1.3.2

Feb 24, 2019

1.3.1

Dec 11, 2018

1.3

Jun 19, 2018

1.2

Jun 19, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jiwer-1.3.2.tar.gz (5.3 kB view details)

Uploaded Feb 24, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jiwer-1.3.2-py3-none-any.whl (9.7 kB view details)

Uploaded Feb 24, 2019 Python 3

File details

Details for the file jiwer-1.3.2.tar.gz.

File metadata

Download URL: jiwer-1.3.2.tar.gz
Upload date: Feb 24, 2019
Size: 5.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for jiwer-1.3.2.tar.gz
Algorithm	Hash digest
SHA256	`7685d73c3fdc192badac28d004ce33e419c5fb91a2298aab311ca995485529f8`
MD5	`a25789848710a0924451a08c6c4dc5f3`
BLAKE2b-256	`c7fd88639901195f2625941efdf2a1496c540b33901499a986fb271af28e4436`

See more details on using hashes here.

File details

Details for the file jiwer-1.3.2-py3-none-any.whl.

File metadata

Download URL: jiwer-1.3.2-py3-none-any.whl
Upload date: Feb 24, 2019
Size: 9.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for jiwer-1.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`245a4a17a3c60373744af7d970d0bef8c5b6e0cc93108c755df12b30d0740be2`
MD5	`6daa939928ccc43c108d980f2d37c3e7`
BLAKE2b-256	`0dfa87dbadc0f584c49494c72be2d2068de2b42a36f4c93e6aeea6cb1665cadf`

See more details on using hashes here.

jiwer 1.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Word Error Rate for automatic speech recognition

Installation

Usage

Additional preprocessing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes