Skip to main content

Compute WER

Project description

compute-wer

PyPI License

A Python package for computing Word Error Rate (WER) and Sentence Error Rate (SER) for evaluating speech recognition systems.

Features

  • Compute WER and SER for speech recognition evaluation
  • Support for both word-level and character-level WER calculation
  • Detailed alignment visualization between reference and hypothesis texts
  • Support for case-sensitive and case-insensitive matching
  • Cluster-based error analysis (Chinese, English, Numbers, etc.)
  • Support for filtering results based on maximum WER threshold
  • Handle tagged text with option to remove tags
  • Support for ignoring punctuation in WER calculation

Installation

pip install compute-wer

Usage

Command Line Interface

Basic Usage

Compute WER between reference and hypothesis texts:

# Compare two texts directly
compute-wer "你好世界" "你好"

# Compare texts from files
compute-wer ref.txt hyp.txt wer.txt

File Format

The input files should contain lines in the format utterance_id text. For example:

ref.txt:

utt1 你好世界
utt2 欢迎使用 compute-wer

hyp.txt:

utt1 你好
utt2 欢迎使用 computer-wer

Advanced Options

# Character-level WER
compute-wer --char ref.txt hyp.txt

# Case-sensitive matching
compute-wer --case-sensitive ref.txt hyp.txt

# Sort results by utterance-id or WER
compute-wer --sort utt ref.txt hyp.txt
compute-wer --sort wer ref.txt hyp.txt

# Remove tags from text
compute-wer --remove-tag ref.txt hyp.txt

# Filter results with WER <= 50%
compute-wer --max-wer 0.5 ref.txt hyp.txt

# Ignore specific words from a file
compute-wer --ignore-file ignore_words.txt ref.txt hyp.txt

# Ignore punctuation (except single quotes)
compute-wer --ignore-punctuation ref.txt hyp.txt

Python API

from compute_wer import Calculator

# Initialize calculator
calculator = Calculator(
    to_char=False,          # Character-level WER
    case_sensitive=False,   # Case-sensitive matching
    remove_tag=True,        # Remove tags from text
    ignore_punctuation=True,# Ignore punctuation (except single quotes)
    max_wer=float('inf')    # Maximum WER threshold
)

# Calculate WER
wer = calculator.calculate("你好世界", "你好")
print(f"WER: {wer}")
print(f"Reference : {' '.join(wer.reference)}")
print(f"Hypothesis: {' '.join(wer.hypothesis)}")

# Get overall statistics
overall_wer, cluster_wers = calculator.overall()
print(f"Overall WER: {overall_wer}")
for cluster, wer in cluster_wers.items():
    print(f"{cluster} WER: {wer}")

CLI Options

Option Description
--char, -c Use character-level WER instead of word-level WER
--sort, -s Sort the hypotheses by utterance-id or WER in ASC
--case-sensitive, -cs Use case-sensitive matching
--remove-tag, -rt Remove tags from the reference and hypothesis
--ignore-punctuation, -ip Ignore punctuation (except single quotes)
--ignore-file, -ig Path to the ignore file
--max-wer, -mw Filter hypotheses with WER <= this value
--verbose, -v Print verbose output

Output Format

The output includes detailed alignment information:

utt: utt1
WER: 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
ref: 你 好 世 界
hyp: 你 好

===========================================================================
Overall -> 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
Chinese -> 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
SER -> 100.00 % N=1 Cor=0 Err=1 ML=1 MH=0
===========================================================================

Where:

  • N: Total number of reference words/characters
  • Cor: Correct matches
  • Sub: Substitutions
  • Del: Deletions
  • Ins: Insertions
  • SER: Sentence Error Rate
  • ML: Missing Labels (Extra Hypotheses)
  • MH: Missing Hypotheses (Extra Labels)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compute_wer-0.2.4.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

compute_wer-0.2.4-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file compute_wer-0.2.4.tar.gz.

File metadata

  • Download URL: compute_wer-0.2.4.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for compute_wer-0.2.4.tar.gz
Algorithm Hash digest
SHA256 5680c50a91782c0ddf9da9d952957f30b54ee3004bf5945b2d11ed8bbf8023e8
MD5 e2757a1fedfa3c7bb274bb70a619345a
BLAKE2b-256 283efe76f6f72e9dc44a98aece775f5c79a082c22c141cc3da02eca4a6c6399d

See more details on using hashes here.

File details

Details for the file compute_wer-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: compute_wer-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for compute_wer-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ff697622b49db906bdb361a9fbb879494e6032c9b92fb4104266e6c5ba3ddc93
MD5 4721a858cddb1fb0872fb28f16351143
BLAKE2b-256 c0c3387f355d1332b0274aaebd93f0ba268259fe8656b8c3d891d0a563fc6327

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page