Skip to main content

Multilingual grapheme-to-phoneme (G2P) conversion using Transformer models.

Project description

Phonimize

Join the Discord PyPI Python PyPI Downloads License

Phonimize is a multilingual grapheme-to-phoneme (G2P) conversion library built with Transformer models. It’s designed for high accuracy, fast inference, and simple integration into text-to-speech (TTS) or other speech-related systems.


Key Features

  • Easy-to-use API: A simple interface for both training and inference.
  • Multilingual Support: Train a single model on multiple languages.
  • High Performance: Fast and accurate predictions powered by Transformer models.
  • Custom Training: Effortlessly train your own models in just a few lines of code.
  • Optimized for TTS: Ideal for both real-time and offline text-to-speech pipelines.

Installation

To install Phonimize, use the following command:

pip install phonimize

Quickstart

Load a pre-trained model and perform phoneme prediction with this simple example:

from phonimize import Phonemizer

# Load the pre-trained model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("phonemize_m1.pt")

# Phonemize an English text
result = phonemizer("Phonemizing an English text is imposimpable!", lang="en_us")

# Print the result
print(result)

Output:

foʊnɪmaɪzɪŋ æn ɪŋglɪʃ tɛkst ɪz ɪmpəzɪmpəbəl!

Training Your Own Model

You can easily train your own forward or autoregressive Transformer model. All configuration parameters are defined in a simple YAML file (e.g., configs/forward.yaml).

from phonimize.preprocess import preprocess
from phonimize.train import train

# Define your training data
train_data = [
    ("en_us", "young", "jʌŋ"),
    ("de", "benützten", "bənʏt͡stn̩")
] * 1000

# Define your validation data
val_data = [
    ("en_us", "young", "jʌŋ"),
    ("de", "benützten", "bənʏt͡stn̩")
] * 100

# Specify the configuration file
config_file = "configs/forward.yaml"

# Preprocess the data
preprocess(
    config_file=config_file,
    train_data=train_data,
    val_data=val_data,
    deduplicate_train_data=False
)

# Train the model
train(rank=0, num_gpus=1, config_file=config_file)

Checkpoints will be saved in the directory specified in your configuration file.

Inference Example

To perform inference with your trained model:

from phonimize import Phonemizer

# Load your custom model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")

# Get the phonemes for a given text
phonemes = phonemizer("Phonemizing text is simple!", lang="en_us")
print(phonemes)

To inspect detailed predictions, including confidence scores:

result = phonemizer.phonemise_list(["Phonemizing text is simple!"], lang="en_us")

for word, pred in result.predictions.items():
    print(f"Word: {word}, Phonemes: {pred.phonemes}, Confidence: {pred.confidence}")

TorchScript Export

For optimized performance, you can easily export your trained Transformer model to TorchScript:

import torch
from phonimize import Phonemizer

# Load the model from a checkpoint
phonemizer = Phonemizer.from_checkpoint("checkpoints/best_model.pt")

# Convert the model to a TorchScript module
scripted_model = torch.jit.script(phonemizer.predictor.model)
phonemizer.predictor.model = scripted_model

# Run inference with the TorchScript model
phonemizer("Running the TorchScript model!")

Pre-trained Models

This model has been modified for the phonemize library.

Model Language Dataset Repo Version
phonemize_m1 en_us cmudict 0.1.0

Acknowledgment

Phonimize is inspired by DeepPhonemizer, and has been refactored and optimized for simplicity, speed, and modern Python environments.

Phonimize is compatible with Python 3.8+ and distributed under the MIT license. Learn more at: https://github.com/arcosoph/phonemize

License

This project is released under the MIT License.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phonemize-0.2.3.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phonemize-0.2.3-py3-none-any.whl (35.5 kB view details)

Uploaded Python 3

File details

Details for the file phonemize-0.2.3.tar.gz.

File metadata

  • Download URL: phonemize-0.2.3.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for phonemize-0.2.3.tar.gz
Algorithm Hash digest
SHA256 249baea27f8ddf59ad395a502d45009023527df2c00a629793af7dabd5d9fd61
MD5 9a08bb5fb728662cef22ae10d3004c68
BLAKE2b-256 9b345b753c93886d823e71db534610f00b393c0833c7753dba2b7349aebdb16f

See more details on using hashes here.

File details

Details for the file phonemize-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: phonemize-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 35.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for phonemize-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d7c8ab5bc659f3e0978bd48732c0904903902ecc18ecb37f1ac7e150468f6155
MD5 939902df5073f4afde04aa93dd82d5ea
BLAKE2b-256 4d1d5f1a577a5d3828d6db872e603e908418cb82020a2e02a74aa58c22a78392

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page