Skip to main content

Phonimize is a multilingual grapheme-to-phoneme (G2P) library based on Transformer models, designed for accurate, fast, and easy phoneme generation in text-to-speech systems.

Project description

Phonimize

Join the Discord PyPI Python PyPI Downloads License

Phonimize is a multilingual grapheme-to-phoneme (G2P) conversion library built with Transformer models. It’s designed for high accuracy, fast inference, and simple integration into text-to-speech (TTS) or other speech-related systems.


Key Features

  • Easy-to-use API: A simple interface for both training and inference.
  • Multilingual Support: Train a single model on multiple languages.
  • High Performance: Fast and accurate predictions powered by Transformer models.
  • Custom Training: Effortlessly train your own models in just a few lines of code.
  • Optimized for TTS: Ideal for both real-time and offline text-to-speech pipelines.

Installation

To install Phonimize, use the following command:

pip install phonimize

Quickstart

Load a pre-trained model and perform phoneme prediction with this simple example:

from phonimize import Phonemizer

# Load the pre-trained model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("phonemize_m1.pt")

# Phonemize an English text
result = phonemizer("Phonemizing an English text is imposimpable!", lang="en_us")

# Print the result
print(result)

Output:

foʊnɪmaɪzɪŋ æn ɪŋglɪʃ tɛkst ɪz ɪmpəzɪmpəbəl!

Training Your Own Model

You can easily train your own forward or autoregressive Transformer model. All configuration parameters are defined in a simple YAML file (e.g., configs/forward.yaml).

from phonimize.preprocess import preprocess
from phonimize.train import train

# Define your training data
train_data = [
    ("en_us", "young", "jʌŋ"),
    ("de", "benützten", "bənʏt͡stn̩")
] * 1000

# Define your validation data
val_data = [
    ("en_us", "young", "jʌŋ"),
    ("de", "benützten", "bənʏt͡stn̩")
] * 100

# Specify the configuration file
config_file = "configs/forward.yaml"

# Preprocess the data
preprocess(
    config_file=config_file,
    train_data=train_data,
    val_data=val_data,
    deduplicate_train_data=False
)

# Train the model
train(rank=0, num_gpus=1, config_file=config_file)

Checkpoints will be saved in the directory specified in your configuration file.

Inference Example

To perform inference with your trained model:

from phonimize import Phonemizer

# Load your custom model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")

# Get the phonemes for a given text
phonemes = phonemizer("Phonemizing text is simple!", lang="en_us")
print(phonemes)

To inspect detailed predictions, including confidence scores:

result = phonemizer.phonemise_list(["Phonemizing text is simple!"], lang="en_us")

for word, pred in result.predictions.items():
    print(f"Word: {word}, Phonemes: {pred.phonemes}, Confidence: {pred.confidence}")

TorchScript Export

For optimized performance, you can easily export your trained Transformer model to TorchScript:

import torch
from phonimize import Phonemizer

# Load the model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")

# Convert the model to a TorchScript module
scripted_model = torch.jit.script(phonemizer.predictor.model)
phonemizer.predictor.model = scripted_model

# Run inference with the TorchScript model
phonemizer("Running the TorchScript model!")

Pre-trained Models

This model has been modified for the phonemize library.

Model Language Dataset Repo Version
phonemize_m1 en_us cmudict 0.1.0

Acknowledgment

Phonimize is inspired by DeepPhonemizer, and has been refactored and optimized for simplicity, speed, and modern Python environments.

License

This project is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phonemize-0.2.2.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phonemize-0.2.2-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file phonemize-0.2.2.tar.gz.

File metadata

  • Download URL: phonemize-0.2.2.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for phonemize-0.2.2.tar.gz
Algorithm Hash digest
SHA256 7aa58e0b330d28e685a8d47c331378fb47b0a7cbea117e6cf41e591eb4fd7a80
MD5 9f3f72e095dac1f53d34f16a021f8823
BLAKE2b-256 c92e6635f18f1ad94d2d3578534efde3ea4f3f671d54a67e29ec7b971010b6b6

See more details on using hashes here.

File details

Details for the file phonemize-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: phonemize-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for phonemize-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fa4a5781fb09b7a375d33a5268df27f75b794a2ab95e4ef3836e853850318895
MD5 36537e7acc68e72ddf8b03b90e27cdaa
BLAKE2b-256 fe5ad4c3ec6a3094526572453f9ef9e41e7803404559be1bc960e080ab472c07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page