Skip to main content

Text normalization for Whisper models

Project description

Whisper Normalization

PyPI

A Python package for text normalization, specifically designed for use with Whisper models.

Note: This code is extracted from OpenAI's Whisper repository to allow for standalone usage of the text normalization modules without the heavy dependencies of the full Whisper package.

Installation

You can install this package directly from source:

pip install .

You can also be installed from PyPi:

pip install whisper-normalization

Usage

from whisper_normalization import EnglishTextNormalizer, BasicTextNormalizer

# English Normalization
normalizer = EnglishTextNormalizer()
text = "Mr. Smith bought $5.50 worth of apples in the 1990s."
normalized = normalizer(text)
print(normalized)
# Output: "mister smith bought five dollars and fifty cents worth of apples in the nineteen nineties"

# Basic Normalization
basic_normalizer = BasicTextNormalizer()
text = "Bonjour à tous!"
normalized = basic_normalizer(text)
print(normalized)

Features

  • Basic text cleaning (symbol removal, diacritic handling)
  • Advanced English normalization:
    • Number to text conversion (integers, decimals, currencies, years)
    • Contraction expansion
    • Abbreviation expansion
    • British to American spelling normalization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_normalization-1.0.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_normalization-1.0.0-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file whisper_normalization-1.0.0.tar.gz.

File metadata

  • Download URL: whisper_normalization-1.0.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for whisper_normalization-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f40718b485635feb87e15cbe1d736a748e3e1dc1a4855fda6743038ea9bc1364
MD5 52a5b65c6b404cded7ca1b461c9f85f3
BLAKE2b-256 af5dd9083d7ec558e1136a12d835e597b3c6875f2277c018b4ff90dd48e1316e

See more details on using hashes here.

File details

Details for the file whisper_normalization-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_normalization-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1fe0c47a84c00fc861787999d9bb3ad4f35295e58c24fa5b67ec974ee5d86d57
MD5 96ca412d92ba52724814af20d2a93b8d
BLAKE2b-256 a66a251a66beb4430650a9ca3291ad7597321b3fb910f3938333dac978be97fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page