A text normalization package for TTS preprocessing with multi-language support
Project description
FIDA Normalizer
A text normalization package for TTS (Text-to-Speech) preprocessing with multi-language support.
Features
- Multi-language Support: Supports Italian, English, French, German, Spanish, Portuguese, Dutch, and Swedish
- Text Normalization: Converts numbers, dates, times, emails, URLs, and special characters to spoken forms
- Phonemization Support: Optional IPA phoneme generation using Espeak
- TTS Integration: Designed to work with NeMo and other TTS systems
- Currency & Units: Handles currency symbols and measurement units
Installation
From PyPI (Recommended)
pip install fida-normalizer
From Source
git clone <repository-url>
cd fida_tts
pip install -e .
For Development
pip install -e ".[dev]"
Usage
Basic Usage
from Normalizer import Normalizer
# Create a normalizer instance (Italian language by default)
normalizer = Normalizer(lang='it')
# Normalize text
result = normalizer.normalize("Il prezzo è 50,50 euro")
print(result) # Output: il prezzo è cinquanta virgola cinquanta euro
With Different Languages
from Normalizer import Normalizer
# English normalizer
en_normalizer = Normalizer(lang='en')
result = en_normalizer.normalize("The price is $100.50")
print(result) # Output: the price is dollar one hundred point fifty
With Phonemization
from Normalizer import Normalizer
# Enable phonemization (requires espeak/espeak-ng installed)
normalizer = Normalizer(lang='it', phonemize=True)
result = normalizer.normalize("Ciao mondo")
print(result) # Output: IPA phonemes
Configuration Options
from Normalizer import Normalizer
normalizer = Normalizer(
lang='it', # Language code: 'it', 'en', 'fr', 'de', 'es', 'pt', 'nl', 'sv'
tts_mode=True, # TTS mode: keeps punctuation suitable for TTS
to_lower=True, # Convert output to lowercase
phonemize=False # Enable/disable phonemization
)
Supported Languages
| Language | Code |
|---|---|
| Italian | it |
| English | en |
| French | fr |
| German | de |
| Spanish | es |
| Portuguese | pt |
| Dutch | nl |
| Swedish | sv |
Supported Transformations
- Numbers: Converts digits to spoken words (e.g., "123" → "one hundred twenty-three")
- Decimals: Handles decimal separators based on locale
- Percentages: Converts "%" to spoken form
- Currency: Handles $, €, £, ¥ symbols
- Dates: Converts date formats to spoken form
- Times: Converts time formats (HH:MM, HH:MM:SS)
- Emails: Spells out email addresses
- URLs/Domains: Spells out web addresses
- Units: Converts measurement units (m, kg, km/h, etc.)
- Special Characters: Handles @, &, -, _, /, etc.
Requirements
- Python >= 3.8
- phonemizer >= 3.0.0 (optional, for phonemization)
- espeak or espeak-ng (system dependency for phonemization)
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fida_normalizer-0.1.1.tar.gz
(21.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fida_normalizer-0.1.1.tar.gz.
File metadata
- Download URL: fida_normalizer-0.1.1.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40f5cda20db3e3bc44b965d9e985a9c6a9db1dda638d5a9851b0bf0379fff0de
|
|
| MD5 |
30c6291a34b443cf63cfc1ee7744c6ad
|
|
| BLAKE2b-256 |
be83d34138b77754ecce3d1d3186a7575e6179ce2d4bdba79b430a0b2930b88d
|
File details
Details for the file fida_normalizer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fida_normalizer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0441e8e2d0ec3c47024132eee6d000ac6021bed23025582f978e57a924a2f54
|
|
| MD5 |
f7cf22ba7b718f1844c3d606bdfd32aa
|
|
| BLAKE2b-256 |
96598524c6a754e23cf8033755d5cf895ebc22d9bfb12cd461fd8f73b36bb5e1
|