jiwer-compatible WER normalizer with number, email, URL, filler, and symbol normalization for voice AI evaluation in English, German, and French

These details have not been verified by PyPI

Project links

Homepage

Project description

extended-wer-normalizer

jiwer-compatible text normalizer for Word Error Rate (WER) evaluation in voice AI.

Extends jiwer's built-in transforms with normalizations that matter for real-world ASR evaluation: phone numbers, emails, URLs, currency, percentages, ordinals, filler words, and stuttering.

Installation

pip install extended-wer-normalizer

Quick start

from extended_wer_normalizer import normalize_for_wer

normalize_for_wer("Call 0176 or email info@example.com, it costs $5.99")
# → "call 0 1 7 6 or email info at example dot com it costs five dollars ninety nine cents"

normalize_for_wer("Um, 1st place goes to Dr. Smith with 50% accuracy")
# → "first place goes to doctor smith with fifty percent accuracy"

jiwer integration

Every normalization is a jiwer.AbstractTransform subclass — compose them freely:

import jiwer
from extended_wer_normalizer.transforms import NormalizeEmails, ExpandDigitRuns

pipeline = jiwer.Compose([
    NormalizeEmails(),
    ExpandDigitRuns(),
    jiwer.ToLowerCase(),
    jiwer.RemovePunctuation(),
    jiwer.ReduceToListOfListOfWords(),
])

wer = jiwer.wer("info at example dot com", "info@example.com", hypothesis_transform=pipeline)

Use the pre-built pipeline directly with jiwer.wer:

import jiwer
from extended_wer_normalizer import english_wer_pipeline

wer = jiwer.wer(reference, hypothesis, reference_transform=english_wer_pipeline, hypothesis_transform=english_wer_pipeline)

Available transforms

Transform	Example
`ExpandDigitRuns`	`"0176"` → `"0 1 7 6"`
`DigitWordsToChars`	`"zero one seven"` → `"0 1 7"`
`NormalizeEmails`	`"user@example.com"` → `"user at example dot com"`
`NormalizeURLs`	`"https://example.com/path"` → `"example dot com"`
`NormalizeCurrency`	`"$5.99"` → `"five dollars ninety nine cents"`
`NormalizePercentages`	`"50%"` → `"fifty percent"`
`NormalizeOrdinals`	`"1st"` → `"first"`, `"15th"` → `"fifteenth"`
`ExpandAbbreviations`	`"Dr."` → `"doctor"`, `"vs."` → `"versus"`
`NormalizeSymbols`	`"cats & dogs"` → `"cats and dogs"`
`RemoveFillerWords`	removes `um`, `uh`, `hmm`, `er`, `ah`, …
`CollapseRepetitions`	`"I I I think"` → `"I think"`
`ExpandFrenchElisions`	`"j'aime"` → `"j aime"`, `"qu'il"` → `"qu il"` (French only)

Every transform that consumes language-specific data accepts a language="en" keyword (default English): NormalizeEmails(language="fr"), ExpandAbbreviations(language="de"), etc.

Pipeline design

The English pipeline applies transforms left-to-right in a single pass:

Pattern-specific (before punctuation is stripped): email, URL, symbol, abbreviation, currency, percentage, ordinal
Core: contractions (I'm → i am), lowercase, punctuation removal
Digit normalization: expand digit runs (0176 → 0 1 7 6), convert digit words (zero → 0)
Cleanup: filler words, repetition collapse

Supported languages

Full pipelines (with language-specific abbreviations, fillers, lexicons, and number/ordinal/percentage word forms via num2words) ship for English, German, and French. Pass any other language value for the minimal fallback (lowercase + punctuation + whitespace).

from extended_wer_normalizer import normalize_for_wer

# German: titles, fillers, ordinals, currency
normalize_for_wer("Hr. Müller, am 1. Januar, ähm, ungefähr 50% Rabatt", language="de")
# → "herr müller am erste januar ungefähr fünfzig prozent rabatt"

# French: elision contractions, ordinals, comma-decimal currency
normalize_for_wer("M. Dupont, le 1er janvier, c'est €5,99", language="fr")
# → "monsieur dupont le premier janvier c est cinq euros quatre vingt dix neuf centimes"

# Spanish, Italian, … fall through to the minimal pipeline
normalize_for_wer("¡Hola, mundo!", language="es")
# → "hola mundo"

Per-language pipelines are also exposed for direct use with jiwer.wer:

from extended_wer_normalizer import (
    english_wer_pipeline,
    german_wer_pipeline,
    french_wer_pipeline,
)

To inspect or extend the language data:

from extended_wer_normalizer.languages import get_language_data, supported_languages

supported_languages()              # ["de", "en", "fr"]
get_language_data("de").abbreviations["hr."]  # "herr"

Quirks worth knowing

Comma vs. period decimals: French uses , (€5,99, 3,5%); the currency and percentage transforms accept either separator regardless of language.
German ordinals: matched as 1- to 3-digit numbers followed by . and a word (e.g. "1. Januar" but not "Es war 1990." or "1.5 Liter"). 4+ digits and decimals are skipped to avoid false positives on years.
French ordinals: matched as 1er, 1ère, 2e, 2es, 2ème, 2èmes, 2nde, 2nds, 2nd. num2words returns masculine forms (premier, deuxième); feminine variants like première or seconde are not produced.
Contractions: jiwer.ExpandCommonEnglishContractions runs only for English. French has a custom ExpandFrenchElisions that splits j', l', d', n', s', m', t', c', qu', jusqu', lorsqu', puisqu', quoiqu' from the following word. German has no contraction step.
German pluralization: most currency units stay invariant (fünf Euro, not fünf Euros).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

Apr 30, 2026

0.2.1

Apr 28, 2026

0.2.0

Apr 28, 2026

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extended_wer_normalizer-0.3.0.tar.gz (35.7 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

extended_wer_normalizer-0.3.0-py3-none-any.whl (13.7 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file extended_wer_normalizer-0.3.0.tar.gz.

File metadata

Download URL: extended_wer_normalizer-0.3.0.tar.gz
Upload date: Apr 30, 2026
Size: 35.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.11

File hashes

Hashes for extended_wer_normalizer-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`489c2a250d8c802baa0fa6c0154694ad73d6f2fe95a0536a3f94e991f59146de`
MD5	`b0e48e700d4d38773da1ebc4f36be3b1`
BLAKE2b-256	`f17a3c10649b4b54362bdc162f8bc3d23d83ef98d2f1a0bae6839c50beb596a1`

See more details on using hashes here.

File details

Details for the file extended_wer_normalizer-0.3.0-py3-none-any.whl.

File metadata

Download URL: extended_wer_normalizer-0.3.0-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 13.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.11

File hashes

Hashes for extended_wer_normalizer-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f6cbadd4f00ccf01dfaeb822309bfdf210c146ff7feede5b33039e0e8a702cde`
MD5	`95eec8f71a50c21e9d016154f99c9857`
BLAKE2b-256	`f7a4eaf3970ca44601220e40e712c14b3636ddbfa277f1a42cefaca3a5317398`

See more details on using hashes here.

extended-wer-normalizer 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

extended-wer-normalizer

Installation

Quick start

jiwer integration

Available transforms

Pipeline design

Supported languages

Quirks worth knowing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes