Multilingual grapheme-to-phoneme (G2P) conversion using Transformer models.
Project description
Phonemize
Phonemize is a multilingual grapheme-to-phoneme (G2P) conversion library built with Transformer models. It’s designed for high accuracy, fast inference, and simple integration into text-to-speech (TTS) or other speech-related systems.
Key Features
- Easy-to-use API: A simple interface for both training and inference.
- Multilingual Support: Train a single model on multiple languages.
- High Performance: Fast and accurate predictions powered by Transformer models.
- Custom Training: Effortlessly train your own models in just a few lines of code.
- Optimized for TTS: Ideal for both real-time and offline text-to-speech pipelines.
Installation
To install Phonemize, use the following command:
pip install phonemize
To train your own models, install the full package with all training dependencies:
Quickstart
Load a pre-trained model and perform phoneme prediction with this simple example:
from phonemize import Phonemizer
# Load the pre-trained model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("phonemize_m1.pt")
# Phonemize an English text
result = phonemizer("Phonemizing an English text is imposimpable!", lang="en_us")
# Print the result
print(result)
Output:
foʊnɪmaɪzɪŋ æn ɪŋglɪʃ tɛkst ɪz ɪmpəzɪmpəbəl!
Training Your Own Model
You can easily train your own forward or autoregressive Transformer model. All configuration parameters are defined in a simple YAML file (e.g., configs/forward.yaml).
from phonemize.preprocess import preprocess
from phonemize.train import train
# Define your training data
train_data = [
("en_us", "young", "jʌŋ"),
("de", "benützten", "bənʏt͡stn̩")
] * 1000
# Define your validation data
val_data = [
("en_us", "young", "jʌŋ"),
("de", "benützten", "bənʏt͡stn̩")
] * 100
# Specify the configuration file
config_file = "configs/forward.yaml"
# Preprocess the data
preprocess(
config_file=config_file,
train_data=train_data,
val_data=val_data,
deduplicate_train_data=False
)
# Train the model
train(rank=0, num_gpus=1, config_file=config_file)
Checkpoints will be saved in the directory specified in your configuration file.
Inference Example
To perform inference with your trained model:
from phonemize import Phonemizer
# Load your custom model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")
# Get the phonemes for a given text
phonemes = phonemizer("Phonemizing text is simple!", lang="en_us")
print(phonemes)
To inspect detailed predictions, including confidence scores:
result = phonemizer.phonemise_list(["Phonemizing text is simple!"], lang="en_us")
for word, pred in result.predictions.items():
print(f"Word: {word}, Phonemes: {pred.phonemes}, Confidence: {pred.confidence}")
TorchScript Export
For optimized performance, you can easily export your trained Transformer model to TorchScript:
import torch
from phonemize import Phonemizer
# Load the model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")
# Convert the model to a TorchScript module
scripted_model = torch.jit.script(phonemizer.predictor.model)
phonemizer.predictor.model = scripted_model
# Run inference with the TorchScript model
phonemizer("Running the TorchScript model!")
Pre-trained Models
This model has been modified for the phonemize library.
| Model | Language | Dataset | Repo Version |
|---|---|---|---|
| phonemize_m1 | en_us | cmudict | 0.1.0 |
Acknowledgment
Phonemize is inspired by DeepPhonemizer, and has been refactored and optimized for simplicity, speed, and modern Python environments.
License
This project is released under the MIT License.
Phonemize is compatible with Python 3.8+ and distributed under the MIT license. Learn more at: https://github.com/arcosoph/phonemize
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phonemize-0.2.4.tar.gz.
File metadata
- Download URL: phonemize-0.2.4.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
534568a49abb1552731d7bf3557c7563c4ed6bf4646f82abf0045c6b3bc82c43
|
|
| MD5 |
1e1a66f9fce4ebf34a55f597ff2fd323
|
|
| BLAKE2b-256 |
af6458c19730843923ad0f421e469fa7780345c9693169f5325559248250f33a
|
File details
Details for the file phonemize-0.2.4-py3-none-any.whl.
File metadata
- Download URL: phonemize-0.2.4-py3-none-any.whl
- Upload date:
- Size: 35.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61eca8467ea2e474bfcaa876105d9e6909539517b81a5aa7df407461647b07ed
|
|
| MD5 |
6b3cd2098ecbfef8161b9e92c3265bb3
|
|
| BLAKE2b-256 |
822a63c5d293eadbf313a3914bbf39ebbb01cd8db4f70d9f64542a1888b24abf
|