Multilingual grapheme-to-phoneme (G2P) conversion using Transformer models.
Project description
Phonimize
Phonimize is a multilingual grapheme-to-phoneme (G2P) conversion library built with Transformer models. It’s designed for high accuracy, fast inference, and simple integration into text-to-speech (TTS) or other speech-related systems.
Key Features
- Easy-to-use API: A simple interface for both training and inference.
- Multilingual Support: Train a single model on multiple languages.
- High Performance: Fast and accurate predictions powered by Transformer models.
- Custom Training: Effortlessly train your own models in just a few lines of code.
- Optimized for TTS: Ideal for both real-time and offline text-to-speech pipelines.
Installation
To install Phonimize, use the following command:
pip install phonimize
Quickstart
Load a pre-trained model and perform phoneme prediction with this simple example:
from phonimize import Phonemizer
# Load the pre-trained model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("phonemize_m1.pt")
# Phonemize an English text
result = phonemizer("Phonemizing an English text is imposimpable!", lang="en_us")
# Print the result
print(result)
Output:
foʊnɪmaɪzɪŋ æn ɪŋglɪʃ tɛkst ɪz ɪmpəzɪmpəbəl!
Training Your Own Model
You can easily train your own forward or autoregressive Transformer model. All configuration parameters are defined in a simple YAML file (e.g., configs/forward.yaml).
from phonimize.preprocess import preprocess
from phonimize.train import train
# Define your training data
train_data = [
("en_us", "young", "jʌŋ"),
("de", "benützten", "bənʏt͡stn̩")
] * 1000
# Define your validation data
val_data = [
("en_us", "young", "jʌŋ"),
("de", "benützten", "bənʏt͡stn̩")
] * 100
# Specify the configuration file
config_file = "configs/forward.yaml"
# Preprocess the data
preprocess(
config_file=config_file,
train_data=train_data,
val_data=val_data,
deduplicate_train_data=False
)
# Train the model
train(rank=0, num_gpus=1, config_file=config_file)
Checkpoints will be saved in the directory specified in your configuration file.
Inference Example
To perform inference with your trained model:
from phonimize import Phonemizer
# Load your custom model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")
# Get the phonemes for a given text
phonemes = phonemizer("Phonemizing text is simple!", lang="en_us")
print(phonemes)
To inspect detailed predictions, including confidence scores:
result = phonemizer.phonemise_list(["Phonemizing text is simple!"], lang="en_us")
for word, pred in result.predictions.items():
print(f"Word: {word}, Phonemes: {pred.phonemes}, Confidence: {pred.confidence}")
TorchScript Export
For optimized performance, you can easily export your trained Transformer model to TorchScript:
import torch
from phonimize import Phonemizer
# Load the model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")
# Convert the model to a TorchScript module
scripted_model = torch.jit.script(phonemizer.predictor.model)
phonemizer.predictor.model = scripted_model
# Run inference with the TorchScript model
phonemizer("Running the TorchScript model!")
Pre-trained Models
This model has been modified for the phonemize library.
| Model | Language | Dataset | Repo Version |
|---|---|---|---|
| phonemize_m1 | en_us | cmudict | 0.1.0 |
Acknowledgment
Phonimize is inspired by DeepPhonemizer, and has been refactored and optimized for simplicity, speed, and modern Python environments.
Phonimize is compatible with Python 3.8+ and distributed under the MIT license. Learn more at: https://github.com/arcosoph/phonemize
License
This project is released under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phonemize-0.2.3.tar.gz.
File metadata
- Download URL: phonemize-0.2.3.tar.gz
- Upload date:
- Size: 27.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
249baea27f8ddf59ad395a502d45009023527df2c00a629793af7dabd5d9fd61
|
|
| MD5 |
9a08bb5fb728662cef22ae10d3004c68
|
|
| BLAKE2b-256 |
9b345b753c93886d823e71db534610f00b393c0833c7753dba2b7349aebdb16f
|
File details
Details for the file phonemize-0.2.3-py3-none-any.whl.
File metadata
- Download URL: phonemize-0.2.3-py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7c8ab5bc659f3e0978bd48732c0904903902ecc18ecb37f1ac7e150468f6155
|
|
| MD5 |
939902df5073f4afde04aa93dd82d5ea
|
|
| BLAKE2b-256 |
4d1d5f1a577a5d3828d6db872e603e908418cb82020a2e02a74aa58c22a78392
|