Skip to main content

Compute WER

Project description

compute-wer

PyPI License

A Python package for computing Word Error Rate (WER) and Sentence Error Rate (SER) for evaluating speech recognition systems.

Features

  • Compute WER and SER for speech recognition evaluation
  • Support for both word-level and character-level WER calculation
  • Detailed alignment visualization between reference and hypothesis texts
  • Support for case-sensitive and case-insensitive matching
  • Cluster-based error analysis (Chinese, English, Numbers, etc.)
  • Support for filtering results based on maximum WER threshold
  • Handle tagged text with option to remove tags
  • Support for ignoring punctuation in WER calculation

Installation

pip install compute-wer

Usage

Command Line Interface

Basic Usage

Compute WER between reference and hypothesis texts:

# Compare two texts directly
compute-wer "你好世界" "你好"

# Compare texts from files
compute-wer ref.txt hyp.txt wer.txt

File Format

The input files should contain lines in the format utterance_id text. For example:

ref.txt:

utt1 你好世界
utt2 欢迎使用 compute-wer

hyp.txt:

utt1 你好
utt2 欢迎使用 computer-wer

Advanced Options

# Character-level WER
compute-wer --char ref.txt hyp.txt

# Case-sensitive matching
compute-wer --case-sensitive ref.txt hyp.txt

# Sort results by utterance-id or WER
compute-wer --sort utt ref.txt hyp.txt
compute-wer --sort wer ref.txt hyp.txt

# Remove tags from text
compute-wer --remove-tag ref.txt hyp.txt

# Filter results with WER <= 50%
compute-wer --max-wer 0.5 ref.txt hyp.txt

# Ignore specific words from a file
compute-wer --ignore-file ignore_words.txt ref.txt hyp.txt

# Ignore punctuation (except single quotes)
compute-wer --ignore-punctuation ref.txt hyp.txt

Python API

from compute_wer import Calculator

# Initialize calculator
calculator = Calculator(
    to_char=False,          # Character-level WER
    case_sensitive=False,   # Case-sensitive matching
    remove_tag=True,        # Remove tags from text
    ignore_punctuation=True,# Ignore punctuation (except single quotes)
    max_wer=float('inf')    # Maximum WER threshold
)

# Calculate WER
wer = calculator.calculate("你好世界", "你好")
print(f"WER: {wer}")
print(f"Reference : {' '.join(wer.reference)}")
print(f"Hypothesis: {' '.join(wer.hypothesis)}")

# Get overall statistics
overall_wer, cluster_wers = calculator.overall()
print(f"Overall WER: {overall_wer}")
for cluster, wer in cluster_wers.items():
    print(f"{cluster} WER: {wer}")

CLI Options

Option Description
--char, -c Use character-level WER instead of word-level WER
--sort, -s Sort the hypotheses by utterance-id or WER in ASC
--case-sensitive, -cs Use case-sensitive matching
--remove-tag, -rt Remove tags from the reference and hypothesis
--ignore-punctuation, -ip Ignore punctuation (except single quotes)
--ignore-file, -ig Path to the ignore file
--max-wer, -mw Filter hypotheses with WER <= this value
--verbose, -v Print verbose output

Output Format

The output includes detailed alignment information:

utt: utt1
WER: 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
ref: 你 好 世 界
hyp: 你 好

===========================================================================
Overall -> 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
Chinese -> 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
SER -> 100.00 % N=1 Cor=0 Err=1 ML=1 MH=0
===========================================================================

Where:

  • N: Total number of reference words/characters
  • Cor: Correct matches
  • Sub: Substitutions
  • Del: Deletions
  • Ins: Insertions
  • SER: Sentence Error Rate
  • ML: Missing Labels (Extra Hypotheses)
  • MH: Missing Hypotheses (Extra Labels)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compute_wer-0.2.5.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

compute_wer-0.2.5-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file compute_wer-0.2.5.tar.gz.

File metadata

  • Download URL: compute_wer-0.2.5.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for compute_wer-0.2.5.tar.gz
Algorithm Hash digest
SHA256 fa25ccf18bc6af5cc4a55a35c4b3bbf0475eadef41831d90968bf2ce94a49767
MD5 a3881a3735ca1ccdb5a68e221528cf59
BLAKE2b-256 e8638936e81b6413d7ed34f0ce1dc87d994042f64b8248a5e4fc37655e88a9b6

See more details on using hashes here.

File details

Details for the file compute_wer-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: compute_wer-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for compute_wer-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 87991734c09e226c117dc4c758a6a726065bf77b52ab843bf76715fca18fc117
MD5 e3669abd976721fb2f63d8ed1d0aba11
BLAKE2b-256 815e12fc9ab9d50dec4cd764ce9d305076a5fbd0c928e593c9521f3bd7d60ecb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page