Skip to main content

Multilingual grapheme-to-phoneme (G2P) conversion using Transformer models.

Project description

Phonemize

Join the Discord PyPI Python PyPI Downloads License

Phonemize is a multilingual grapheme-to-phoneme (G2P) conversion library built with Transformer models. It’s designed for high accuracy, fast inference, and simple integration into text-to-speech (TTS) or other speech-related systems.


Key Features

  • Easy-to-use API: A simple interface for both training and inference.
  • Multilingual Support: Train a single model on multiple languages.
  • High Performance: Fast and accurate predictions powered by Transformer models.
  • Custom Training: Effortlessly train your own models in just a few lines of code.
  • Optimized for TTS: Ideal for both real-time and offline text-to-speech pipelines.

Installation

To install Phonemize, use the following command:

pip install phonemize

To train your own models, install the full package with all training dependencies:


Quickstart

Load a pre-trained model and perform phoneme prediction with this simple example:

from phonemize import Phonemizer

# Load the pre-trained model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("phonemize_m1.pt")

# Phonemize an English text
result = phonemizer("Phonemizing an English text is imposimpable!", lang="en_us")

# Print the result
print(result)

Output:

foʊnɪmaɪzɪŋ æn ɪŋglɪʃ tɛkst ɪz ɪmpəzɪmpəbəl!

Training Your Own Model

You can easily train your own forward or autoregressive Transformer model. All configuration parameters are defined in a simple YAML file (e.g., configs/forward.yaml).

from phonemize.preprocess import preprocess
from phonemize.train import train

# Define your training data
train_data = [
    ("en_us", "young", "jʌŋ"),
    ("de", "benützten", "bənʏt͡stn̩")
] * 1000

# Define your validation data
val_data = [
    ("en_us", "young", "jʌŋ"),
    ("de", "benützten", "bənʏt͡stn̩")
] * 100

# Specify the configuration file
config_file = "configs/forward.yaml"

# Preprocess the data
preprocess(
    config_file=config_file,
    train_data=train_data,
    val_data=val_data,
    deduplicate_train_data=False
)

# Train the model
train(rank=0, num_gpus=1, config_file=config_file)

Checkpoints will be saved in the directory specified in your configuration file.

Inference Example

To perform inference with your trained model:

from phonemize import Phonemizer

# Load your custom model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")

# Get the phonemes for a given text
phonemes = phonemizer("Phonemizing text is simple!", lang="en_us")
print(phonemes)

To inspect detailed predictions, including confidence scores:

result = phonemizer.phonemise_list(["Phonemizing text is simple!"], lang="en_us")

for word, pred in result.predictions.items():
    print(f"Word: {word}, Phonemes: {pred.phonemes}, Confidence: {pred.confidence}")

TorchScript Export

For optimized performance, you can easily export your trained Transformer model to TorchScript:

import torch
from phonemize import Phonemizer

# Load the model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")

# Convert the model to a TorchScript module
scripted_model = torch.jit.script(phonemizer.predictor.model)
phonemizer.predictor.model = scripted_model

# Run inference with the TorchScript model
phonemizer("Running the TorchScript model!")

Pre-trained Models

This model has been modified for the phonemize library.

Model Language Dataset Repo Version
phonemize_m1 en_us cmudict 0.1.0

Acknowledgment

Phonemize is inspired by DeepPhonemizer, and has been refactored and optimized for simplicity, speed, and modern Python environments.

License

This project is released under the MIT License.

Phonemize is compatible with Python 3.8+ and distributed under the MIT license. Learn more at: https://github.com/arcosoph/phonemize

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phonemize-0.2.4.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phonemize-0.2.4-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file phonemize-0.2.4.tar.gz.

File metadata

  • Download URL: phonemize-0.2.4.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for phonemize-0.2.4.tar.gz
Algorithm Hash digest
SHA256 534568a49abb1552731d7bf3557c7563c4ed6bf4646f82abf0045c6b3bc82c43
MD5 1e1a66f9fce4ebf34a55f597ff2fd323
BLAKE2b-256 af6458c19730843923ad0f421e469fa7780345c9693169f5325559248250f33a

See more details on using hashes here.

File details

Details for the file phonemize-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: phonemize-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 35.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for phonemize-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 61eca8467ea2e474bfcaa876105d9e6909539517b81a5aa7df407461647b07ed
MD5 6b3cd2098ecbfef8161b9e92c3265bb3
BLAKE2b-256 822a63c5d293eadbf313a3914bbf39ebbb01cd8db4f70d9f64542a1888b24abf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page