A Python Library to convert text to phoneme sequence used for XPhoneBERT
Project description
Text2PhonemeSequence: A Python Library to convert text to phoneme sequences used for XPhoneBERT
Installation
-
To install Text2PhonemeSequence, users have to run the following command:
$ pip install text2phonemesequence
Usage example
The library uses CharsiuG2P to convert text to phoneme sequences. Users can find the information on pretrained_g2p_model
and language
in the CharsiuG2P repository. For languages where words are not separated by spaces such as Vietnamese and Chinese, users need to use an external tokenizer before feeding the dataset or sentences into our Text2PhonemeSequence library.
from text2phonemesequence import Text2PhonemeSequence
# Load Text2PhonemeSequence
model = Text2PhonemeSequence(pretrained_g2p_model='charsiu/g2p_multilingual_byT5_tiny_16_layers_100', language='eng-uk', is_cuda=False)
# Convert a raw corpus
model.infer_dataset(input_file="/absolute/path/to/input/file", output_file="/absolute/path/to/output/file")
# Convert a raw sentence
model.infer_sentence("The overwhelming majority of people in this country know how to sift the wheat from the chaff in what they hear and what they read .")
##Output: "ˈθ i ▁ ˈo ʊ v ɝ ˌw ɛ ɫ m ɪ ŋ ▁ m ə ˈd ʒ ɔ ɹ ə t i ▁ ˈɑ f ▁ ˈp i p ə ɫ ▁ ˈɪ n ▁ ˈθ ɪ s ▁ ˈk a ʊ n t ɹ i ▁ ˈn o ʊ ▁ ˈh o ʊ ▁ ˈt o ʊ ▁ ˈs ɪ f t ▁ ˈθ i ▁ ˈw i t ▁ ˈf ɹ ɑ m ▁ ˈθ i ▁ ˈt ʃ æ f ▁ ˈɪ n ▁ ˈw æ t ▁ ˈθ e ɪ ▁ ˈh ɪ ɹ ▁ ˈæ n d ▁ ˈw æ t ▁ ˈθ e ɪ ▁ ˈɹ ɛ d ▁ ."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file text2phonemesequence-0.0.3.tar.gz
.
File metadata
- Download URL: text2phonemesequence-0.0.3.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.28.1 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7b298b0579e4240ad63302c781c57f4817290a8ea6f8c01607d4e0f028e52d6 |
|
MD5 | 873c0d106a72a78dc877075839319c43 |
|
BLAKE2b-256 | ae7bf1cd8390bce0eb4c035de89fc7b831cf0c04b4359b2ba9222e3c6290d885 |