Skip to main content

A text normalization package for TTS preprocessing with multi-language support

Project description

FIDA Normalizer

A text normalization package for TTS (Text-to-Speech) preprocessing with multi-language support.

Features

  • Multi-language Support: Supports Italian, English, French, German, Spanish, Portuguese, Dutch, and Swedish
  • Text Normalization: Converts numbers, dates, times, emails, URLs, and special characters to spoken forms
  • Phonemization Support: Optional IPA phoneme generation using Espeak
  • TTS Integration: Designed to work with NeMo and other TTS systems
  • Currency & Units: Handles currency symbols and measurement units

Installation

From Source

git clone <repository-url>
cd fida_tts
pip install -e .

For Development

pip install -e ".[dev]"

Usage

Basic Usage

from Normalizer import Normalizer

# Create a normalizer instance (Italian language by default)
normalizer = Normalizer(lang='it')

# Normalize text
result = normalizer.normalize("Il prezzo è 50,50 euro")
print(result)  # Output: il prezzo è cinquanta virgola cinquanta euro

With Different Languages

from Normalizer import Normalizer

# English normalizer
en_normalizer = Normalizer(lang='en')
result = en_normalizer.normalize("The price is $100.50")
print(result)  # Output: the price is dollar one hundred point fifty

With Phonemization

from Normalizer import Normalizer

# Enable phonemization (requires espeak/espeak-ng installed)
normalizer = Normalizer(lang='it', phonemize=True)
result = normalizer.normalize("Ciao mondo")
print(result)  # Output: IPA phonemes

Configuration Options

from Normalizer import Normalizer

normalizer = Normalizer(
    lang='it',           # Language code: 'it', 'en', 'fr', 'de', 'es', 'pt', 'nl', 'sv'
    tts_mode=True,       # TTS mode: keeps punctuation suitable for TTS
    to_lower=True,       # Convert output to lowercase
    phonemize=False      # Enable/disable phonemization
)

Supported Languages

Language Code
Italian it
English en
French fr
German de
Spanish es
Portuguese pt
Dutch nl
Swedish sv

Supported Transformations

  • Numbers: Converts digits to spoken words (e.g., "123" → "one hundred twenty-three")
  • Decimals: Handles decimal separators based on locale
  • Percentages: Converts "%" to spoken form
  • Currency: Handles $, €, £, ¥ symbols
  • Dates: Converts date formats to spoken form
  • Times: Converts time formats (HH:MM, HH:MM:SS)
  • Emails: Spells out email addresses
  • URLs/Domains: Spells out web addresses
  • Units: Converts measurement units (m, kg, km/h, etc.)
  • Special Characters: Handles @, &, -, _, /, etc.

Requirements

  • Python >= 3.8
  • phonemizer >= 3.0.0 (optional, for phonemization)
  • espeak or espeak-ng (system dependency for phonemization)

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fida_normalizer-0.1.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fida_normalizer-0.1.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file fida_normalizer-0.1.0.tar.gz.

File metadata

  • Download URL: fida_normalizer-0.1.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for fida_normalizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 37c1096d22a2dd4860eff9ce15e35903a082345b90b2417e83f58a670832b437
MD5 dd01b8684eea427d06e31ed4c334d434
BLAKE2b-256 0724d631836b1d08d7db895df7eb543a6d1f2579763a3b663fc282b3b54a5522

See more details on using hashes here.

File details

Details for the file fida_normalizer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fida_normalizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6a035a598b037695f7130c897ebafb96971610282b772f669c82ac33fcab676
MD5 c26d6b1a5b7d505a4764becd39ed0289
BLAKE2b-256 8b7c51a83aff203507a1ceda24f0b3c0103989c12195e5ce28a259f560bd7d02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page