Skip to main content

A text normalization package for TTS preprocessing with multi-language support

Project description

FIDA Normalizer

A text normalization package for TTS (Text-to-Speech) preprocessing with multi-language support.

Features

  • Multi-language Support: Supports Italian, English, French, German, Spanish, Portuguese, Dutch, and Swedish
  • Text Normalization: Converts numbers, dates, times, emails, URLs, and special characters to spoken forms
  • Phonemization Support: Optional IPA phoneme generation using Espeak
  • TTS Integration: Designed to work with NeMo and other TTS systems
  • Currency & Units: Handles currency symbols and measurement units

Installation

From PyPI (Recommended)

pip install fida-normalizer

From Source

git clone <repository-url>
cd fida_tts
pip install -e .

For Development

pip install -e ".[dev]"

Usage

Basic Usage

from Normalizer import Normalizer

# Create a normalizer instance (Italian language by default)
normalizer = Normalizer(lang='it')

# Normalize text
result = normalizer.normalize("Il prezzo è 50,50 euro")
print(result)  # Output: il prezzo è cinquanta virgola cinquanta euro

With Different Languages

from Normalizer import Normalizer

# English normalizer
en_normalizer = Normalizer(lang='en')
result = en_normalizer.normalize("The price is $100.50")
print(result)  # Output: the price is dollar one hundred point fifty

With Phonemization

from Normalizer import Normalizer

# Enable phonemization (requires espeak/espeak-ng installed)
normalizer = Normalizer(lang='it', phonemize=True)
result = normalizer.normalize("Ciao mondo")
print(result)  # Output: IPA phonemes

Configuration Options

from Normalizer import Normalizer

normalizer = Normalizer(
    lang='it',           # Language code: 'it', 'en', 'fr', 'de', 'es', 'pt', 'nl', 'sv'
    tts_mode=True,       # TTS mode: keeps punctuation suitable for TTS
    to_lower=True,       # Convert output to lowercase
    phonemize=False      # Enable/disable phonemization
)

Supported Languages

Language Code
Italian it
English en
French fr
German de
Spanish es
Portuguese pt
Dutch nl
Swedish sv

Supported Transformations

  • Numbers: Converts digits to spoken words (e.g., "123" → "one hundred twenty-three")
  • Decimals: Handles decimal separators based on locale
  • Percentages: Converts "%" to spoken form
  • Currency: Handles $, €, £, ¥ symbols
  • Dates: Converts date formats to spoken form
  • Times: Converts time formats (HH:MM, HH:MM:SS)
  • Emails: Spells out email addresses
  • URLs/Domains: Spells out web addresses
  • Units: Converts measurement units (m, kg, km/h, etc.)
  • Special Characters: Handles @, &, -, _, /, etc.

Requirements

  • Python >= 3.8
  • phonemizer >= 3.0.0 (optional, for phonemization)
  • espeak or espeak-ng (system dependency for phonemization)

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fida_normalizer-0.1.1.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fida_normalizer-0.1.1-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file fida_normalizer-0.1.1.tar.gz.

File metadata

  • Download URL: fida_normalizer-0.1.1.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for fida_normalizer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 40f5cda20db3e3bc44b965d9e985a9c6a9db1dda638d5a9851b0bf0379fff0de
MD5 30c6291a34b443cf63cfc1ee7744c6ad
BLAKE2b-256 be83d34138b77754ecce3d1d3186a7575e6179ce2d4bdba79b430a0b2930b88d

See more details on using hashes here.

File details

Details for the file fida_normalizer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for fida_normalizer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d0441e8e2d0ec3c47024132eee6d000ac6021bed23025582f978e57a924a2f54
MD5 f7cf22ba7b718f1844c3d606bdfd32aa
BLAKE2b-256 96598524c6a754e23cf8033755d5cf895ebc22d9bfb12cd461fd8f73b36bb5e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page