Skip to main content

A Python Library to convert text to phoneme sequence fast - used for XPhoneBERT (Forked and enhanced from original work by Linh The Nguyen)

Project description

text2phonemefast: A Python Library for Fast Text to Phoneme Conversion

Fork Notice: This repository is maintained by Nguyễn Mạnh Cường as a fork with enhancements from the original Text2PhonemeSequence library created by Linh The Nguyen. Thanks to Linh The Nguyen and the co-developers of the project.

This repository is an enhanced and faster version of the original Text2PhonemeSequence library, which converts text to phoneme sequences for XPhoneBERT.

Key Improvements

Vietnamese Pronunciation Fixes

  • ✅ Fixed "uy" incorrectly pronounced as "ui"
  • ✅ Fixed "gì" incorrectly pronounced as "ghì"
  • ✅ Fixed "oo" sound pronunciation
  • ✅ Fixed "r", "d", "gi" being pronounced identically
  • 🔄 In progress: Fixing "s" and "x" pronounced identically

Performance & Architecture Enhancements

  • ✅ Applied phoneme post-processing to the dataset inference method (improved consistency)
  • ✅ Refactored codebase for better organization and maintainability
  • ✅ Created a unique phoneme dictionary per word (instead of segmenting) for improved speed
  • ✅ Allow saving words that have never appeared in the G2P dictionary before, so that they do not need to be processed again through the pretrained G2P model, which helps improve speed
  • ✅ Merging Vietnamese and English TSV dictionaries for easier multilingual support (Prioritize Vietnamese in case of overlapping sounds, with an estimated 405 overlapping sounds).

Supported Dictionaries

This library supports several specialized pronunciation dictionaries:

  • Standard dictionaries - Automatically downloaded from CharsiuG2P when needed (e.g., vie-n.tsv, eng-us.tsv)
  • Enhanced dictionaries - Specifically optimized for better performance:
    • vie-n.unique.tsv - Vietnamese dictionary with optimized pronunciation
    • eng-us.unique.tsv - English dictionary with optimized pronunciation
    • vie-n.mix-eng-us.tsv - Mixed Vietnamese-English dictionary for multilingual support

When using the .unique or .mix dictionaries, the library will automatically download them from our repository. These specialized dictionaries provide better pronunciation accuracy, especially for Vietnamese.

Installation

To install text2phonemefast:

$ pip install text2phonemefast

Usage Examples

This library uses CharsiuG2P and segments toolkits for text-to-phoneme conversion. Information about pretrained_g2p_model and language can be found in the CharsiuG2P repository.

Note: For languages where words are not separated by spaces (e.g., Vietnamese and Chinese), an external tokenizer should be used before feeding the text into the library.

from text2phonemefast import Text2PhonemeFast

# Load Text2PhonemeFast
model = Text2PhonemeFast(
    pretrained_g2p_model='charsiu/g2p_multilingual_byT5_small_100',
    tokenizer="google/byt5-small",
    g2p_dict_path="vie-n.unique.tsv",
    device="cpu", # or cuda
    language="vie-n",
)

# Convert a raw corpus
model.infer_dataset(input_file="/absolute/path/to/input/file", output_file="/absolute/path/to/output/file") 

# Convert a raw sentence
model.infer_sentence("'xin chào tôi là Mạnh Cường .")
##Output: "s i n ˧˧ ▁ c a w ˧˨ ▁ t o j ˧˧ ▁ l a ˧˨ ▁ m ɛ ŋ ˨ˀ˩ ʔ ▁ k ɯ ə ŋ ˧˨ ▁ ."

Credits

This project is a fork of the original work developed by:

  • Linh The Nguyen - Original author of Text2PhonemeSequence
  • VinAI Research - Developers of XPhoneBERT

Current Maintainer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2phonemefast-0.1.3.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text2phonemefast-0.1.3-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file text2phonemefast-0.1.3.tar.gz.

File metadata

  • Download URL: text2phonemefast-0.1.3.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for text2phonemefast-0.1.3.tar.gz
Algorithm Hash digest
SHA256 dc49e2cca0dd9f68e0e590646bd64c62c04a61dfdcac47d130a44a2e5e9fad42
MD5 68842548a302d93ca23bee538717de16
BLAKE2b-256 a57358618bc5ce7accd52e46c3a4d98d6e1b0602b60656237e3f4fa744db1c12

See more details on using hashes here.

File details

Details for the file text2phonemefast-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for text2phonemefast-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c9b8387f6aaa1fb848566d3baa46b3e57cb8604ae9446de090918794518eca57
MD5 bd1023422f8fb829ac6cef7a4586f1f3
BLAKE2b-256 e6e26cad69900eb1a978d31663442af24b02265ec669758a228d8a7cab449ffd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page