Skip to main content

A Python Library to convert text to phoneme sequence used for XPhoneBERT

Project description

Text2PhonemeSequence: A Python Library to convert text to phoneme sequences used for XPhoneBERT

Installation

  • To install Text2PhonemeSequence, users have to run the following command:

    $ pip install text2phonemesequence

Usage example

The library uses CharsiuG2P to convert text to phoneme sequences. Users can find the information on pretrained_g2p_model and language in the CharsiuG2P repository. For languages where words are not separated by spaces such as Vietnamese and Chinese, users need to use an external tokenizer before feeding the dataset or sentences into our Text2PhonemeSequence library.

from text2phonemesequence import Text2PhonemeSequence

# Load Text2PhonemeSequence
model = Text2PhonemeSequence(pretrained_g2p_model='charsiu/g2p_multilingual_byT5_tiny_16_layers_100', language='eng-uk', is_cuda=False)


# Convert a raw corpus
model.infer_dataset(input_file="/absolute/path/to/input/file", output_file="/absolute/path/to/output/file")

# Convert a raw sentence
model.infer_sentence("The overwhelming majority of people in this country know how to sift the wheat from the chaff in what they hear and what they read .")
##Output: "ˈθ i ▁ ˈo ʊ v ɝ ˌw ɛ ɫ m ɪ ŋ ▁ m ə ˈd ʒ ɔ ɹ ə t i ▁ ˈɑ f ▁ ˈp i p ə ɫ ▁ ˈɪ n ▁ ˈθ ɪ s ▁ ˈk a ʊ n t ɹ i ▁ ˈn o ʊ ▁ ˈh o ʊ ▁ ˈt o ʊ ▁ ˈs ɪ f t ▁ ˈθ i ▁ ˈw i t ▁ ˈf ɹ ɑ m ▁ ˈθ i ▁ ˈt ʃ æ f ▁ ˈɪ n ▁ ˈw æ t ▁ ˈθ e ɪ ▁ ˈh ɪ ɹ ▁ ˈæ n d ▁ ˈw æ t ▁ ˈθ e ɪ ▁ ˈɹ ɛ d ▁ ."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2phonemesequence-0.0.2.tar.gz (4.2 kB view details)

Uploaded Source

File details

Details for the file text2phonemesequence-0.0.2.tar.gz.

File metadata

  • Download URL: text2phonemesequence-0.0.2.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.28.1 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.7.7

File hashes

Hashes for text2phonemesequence-0.0.2.tar.gz
Algorithm Hash digest
SHA256 864b7939b51e95aeae25d7275e8630c45f4482d0fb7bc26e0f9208a8a2f28c80
MD5 a685a96374fe9a4f02a1b036245c4bdc
BLAKE2b-256 99bc744cc37d4338fa4db1e6cc29cb235457c35b1f6890493ff39aa41c2c24db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page