Text normalization for Whisper models
Project description
Whisper Normalization
A Python package for text normalization, specifically designed for use with Whisper models.
Note: This code is extracted from OpenAI's Whisper repository to allow for standalone usage of the text normalization modules without the heavy dependencies of the full Whisper package.
- Original Source: openai/whisper
- License: MIT
Installation
You can install this package directly from source:
pip install .
You can also be installed from PyPi:
pip install whisper-normalization
Usage
from whisper_normalization import EnglishTextNormalizer, BasicTextNormalizer
# English Normalization
normalizer = EnglishTextNormalizer()
text = "Mr. Smith bought $5.50 worth of apples in the 1990s."
normalized = normalizer(text)
print(normalized)
# Output: "mister smith bought five dollars and fifty cents worth of apples in the nineteen nineties"
# Basic Normalization
basic_normalizer = BasicTextNormalizer()
text = "Bonjour à tous!"
normalized = basic_normalizer(text)
print(normalized)
Features
- Basic text cleaning (symbol removal, diacritic handling)
- Advanced English normalization:
- Number to text conversion (integers, decimals, currencies, years)
- Contraction expansion
- Abbreviation expansion
- British to American spelling normalization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisper_normalization-1.0.0.tar.gz.
File metadata
- Download URL: whisper_normalization-1.0.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f40718b485635feb87e15cbe1d736a748e3e1dc1a4855fda6743038ea9bc1364
|
|
| MD5 |
52a5b65c6b404cded7ca1b461c9f85f3
|
|
| BLAKE2b-256 |
af5dd9083d7ec558e1136a12d835e597b3c6875f2277c018b4ff90dd48e1316e
|
File details
Details for the file whisper_normalization-1.0.0-py3-none-any.whl.
File metadata
- Download URL: whisper_normalization-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fe0c47a84c00fc861787999d9bb3ad4f35295e58c24fa5b67ec974ee5d86d57
|
|
| MD5 |
96ca412d92ba52724814af20d2a93b8d
|
|
| BLAKE2b-256 |
a66a251a66beb4430650a9ca3291ad7597321b3fb910f3938333dac978be97fd
|