Compute WER
Project description
compute-wer
A Python package for computing Word Error Rate (WER) and Sentence Error Rate (SER) for evaluating speech recognition systems.
Features
- Compute WER and SER for speech recognition evaluation
- Support for both word-level and character-level WER calculation
- Detailed alignment visualization between reference and hypothesis texts
- Support for case-sensitive and case-insensitive matching
- Cluster-based error analysis (Chinese, English, Numbers, etc.)
- Support for filtering results based on maximum WER threshold
- Handle tagged text with option to remove tags
- Support for ignoring punctuation in WER calculation
Installation
pip install compute-wer
Usage
Command Line Interface
Basic Usage
Compute WER between reference and hypothesis texts:
# Compare two texts directly
compute-wer "你好世界" "你好"
# Compare texts from files
compute-wer ref.txt hyp.txt wer.txt
File Format
The input files should contain lines in the format utterance_id text. For example:
ref.txt:
utt1 你好世界
utt2 欢迎使用 compute-wer
hyp.txt:
utt1 你好
utt2 欢迎使用 computer-wer
Advanced Options
# Character-level WER
compute-wer --char ref.txt hyp.txt
# Case-sensitive matching
compute-wer --case-sensitive ref.txt hyp.txt
# Sort results by utterance-id or WER
compute-wer --sort utt ref.txt hyp.txt
compute-wer --sort wer ref.txt hyp.txt
# Remove tags from text
compute-wer --remove-tag ref.txt hyp.txt
# Filter results with WER <= 50%
compute-wer --max-wer 0.5 ref.txt hyp.txt
# Ignore specific words from a file
compute-wer --ignore-file ignore_words.txt ref.txt hyp.txt
# Ignore punctuation (except single quotes)
compute-wer --ignore-punctuation ref.txt hyp.txt
Python API
from compute_wer import Calculator
# Initialize calculator
calculator = Calculator(
to_char=False, # Character-level WER
case_sensitive=False, # Case-sensitive matching
remove_tag=True, # Remove tags from text
ignore_punctuation=True,# Ignore punctuation (except single quotes)
max_wer=float('inf') # Maximum WER threshold
)
# Calculate WER
wer = calculator.calculate("你好世界", "你好")
print(f"WER: {wer}")
print(f"Reference : {' '.join(wer.reference)}")
print(f"Hypothesis: {' '.join(wer.hypothesis)}")
# Get overall statistics
overall_wer, cluster_wers = calculator.overall()
print(f"Overall WER: {overall_wer}")
for cluster, wer in cluster_wers.items():
print(f"{cluster} WER: {wer}")
CLI Options
| Option | Description |
|---|---|
--char, -c |
Use character-level WER instead of word-level WER |
--sort, -s |
Sort the hypotheses by utterance-id or WER in ASC |
--case-sensitive, -cs |
Use case-sensitive matching |
--remove-tag, -rt |
Remove tags from the reference and hypothesis |
--ignore-punctuation, -ip |
Ignore punctuation (except single quotes) |
--ignore-file, -ig |
Path to the ignore file |
--max-wer, -mw |
Filter hypotheses with WER <= this value |
--verbose, -v |
Print verbose output |
Output Format
The output includes detailed alignment information:
utt: utt1
WER: 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
ref: 你 好 世 界
hyp: 你 好
===========================================================================
Overall -> 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
Chinese -> 50.00 % N=4 Cor=2 Sub=0 Del=2 Ins=0
SER -> 100.00 % N=1 Cor=0 Err=1 ML=1 MH=0
===========================================================================
Where:
N: Total number of reference words/charactersCor: Correct matchesSub: SubstitutionsDel: DeletionsIns: InsertionsSER: Sentence Error RateML: Missing Labels (Extra Hypotheses)MH: Missing Hypotheses (Extra Labels)
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file compute_wer-0.2.4.tar.gz.
File metadata
- Download URL: compute_wer-0.2.4.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5680c50a91782c0ddf9da9d952957f30b54ee3004bf5945b2d11ed8bbf8023e8
|
|
| MD5 |
e2757a1fedfa3c7bb274bb70a619345a
|
|
| BLAKE2b-256 |
283efe76f6f72e9dc44a98aece775f5c79a082c22c141cc3da02eca4a6c6399d
|
File details
Details for the file compute_wer-0.2.4-py3-none-any.whl.
File metadata
- Download URL: compute_wer-0.2.4-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff697622b49db906bdb361a9fbb879494e6032c9b92fb4104266e6c5ba3ddc93
|
|
| MD5 |
4721a858cddb1fb0872fb28f16351143
|
|
| BLAKE2b-256 |
c0c3387f355d1332b0274aaebd93f0ba268259fe8656b8c3d891d0a563fc6327
|