A Python Library to convert text to phoneme sequence used for XPhoneBERT

These details have not been verified by PyPI

Project links

Homepage

Project description

Text2PhonemeSequence: A Python Library to convert text to phoneme sequences used for XPhoneBERT

Installation
Usage example

Installation

To install Text2PhonemeSequence, users have to run the following command:

$ pip install text2phonemesequence

Usage example

The library uses CharsiuG2P and segments toolkits to convert text to phoneme sequences. Users can find the information on pretrained_g2p_model and language in the CharsiuG2P repository. For languages where words are not separated by spaces such as Vietnamese and Chinese, users need to use an external tokenizer before feeding the dataset or sentences into our Text2PhonemeSequence library.

from text2phonemesequence import Text2PhonemeSequence

# Load Text2PhonemeSequence
model = Text2PhonemeSequence(pretrained_g2p_model='charsiu/g2p_multilingual_byT5_small_100', language='eng-us', is_cuda=False)


# Convert a raw corpus
model.infer_dataset(input_file="/absolute/path/to/input/file", output_file="/absolute/path/to/output/file", batch_size=64) # batch_size is the number of words fed into the CharsiuG2P toolkit per times. 

# Convert a raw sentence
model.infer_sentence("The overwhelming majority of people in this country know how to sift the wheat from the chaff in what they hear and what they read .")
##Output: "ˈθ i ▁ ˈo ʊ v ɝ ˌw ɛ ɫ m ɪ ŋ ▁ m ə ˈd ʒ ɔ ɹ ə t i ▁ ˈɑ f ▁ ˈp i p ə ɫ ▁ ˈɪ n ▁ ˈθ ɪ s ▁ ˈk a ʊ n t ɹ i ▁ ˈn o ʊ ▁ ˈh o ʊ ▁ ˈt o ʊ ▁ ˈs ɪ f t ▁ ˈθ i ▁ ˈw i t ▁ ˈf ɹ ɑ m ▁ ˈθ i ▁ ˈt ʃ æ f ▁ ˈɪ n ▁ ˈw æ t ▁ ˈθ e ɪ ▁ ˈh ɪ ɹ ▁ ˈæ n d ▁ ˈw æ t ▁ ˈθ e ɪ ▁ ˈɹ ɛ d ▁ ."

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.4

Jun 6, 2023

0.1.3

Jun 4, 2023

0.1.2

Jun 4, 2023

0.1.1

Jun 4, 2023

0.1.0

Jun 4, 2023

0.0.9

Jun 4, 2023

0.0.8

May 30, 2023

0.0.7

May 29, 2023

0.0.6

May 21, 2023

0.0.5

May 20, 2023

0.0.4

May 19, 2023

0.0.3

May 19, 2023

0.0.2

May 19, 2023

0.0.1

May 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2phonemesequence-0.1.4.tar.gz (4.5 kB view details)

Uploaded Jun 6, 2023 Source

File details

Details for the file text2phonemesequence-0.1.4.tar.gz.

File metadata

Download URL: text2phonemesequence-0.1.4.tar.gz
Upload date: Jun 6, 2023
Size: 4.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.28.1 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.7.7

File hashes

Hashes for text2phonemesequence-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`0922ed73841199c0e661102f6759a5685a95f080e0a8db1670eece2ac036d8e5`
MD5	`d52fc6efb5e6be6d961a04ba90eca933`
BLAKE2b-256	`1c02715f86f204ad1d75a97706c6629ea9ba5de520726d31197c335ecfe7d51a`