Burmese text normalizer, wordbreak, converter, cleaner and phonemizer for speech related tasks.
Project description
BURMESE PHONEMIZER AND CLEANER(BPC)
Installation
$ pip install bpc
or
$ pip install git+git://github.com:1chimaruGin/Burmese_Phomizer_and_Cleaner.git
Usage
For text Cleaning
from bpc import Cleaner
cc = Cleaner()
cc.clean_text("မင်္ဂလာပါ? မင်္ဂလာပါ။ ၀န်းရံ ဝ၁၂၃၄ 5B")
# output: မင်္ဂလာပါ မင်္ဂလာပါ ၀န်းရံ ဝ၁၂၃၄ 5B
For phonemization
from bpc import BurmesePhoneme
bp = BurmesePhonemizer()
bp.text_to_phone("မင်္ဂလာပါ")
# output: ['m', 'ŋ', 'ɡ', 'l', 't', 's', 'p', 'ˈe']
For data preparation
from bpc.dataset import PrepareDataset
dataset = PrepareDataset()
dataset.prepare_data(path='path/to/dataset', method='kfold', save=True)
References
Citations
@inproceedings{watanabe2018espnet,
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
title={{ESPnet}: End-to-End Speech Processing Toolkit},
year={2018},
booktitle={Proceedings of Interspeech},
pages={2207--2211},
doi={10.21437/Interspeech.2018-1456},
url={http://dx.doi.org/10.21437/Interspeech.2018-1456
}
@article{Bernard2021,
doi = {10.21105/joss.03958},
url = {https://doi.org/10.21105/joss.03958},
year = {2021},
publisher = {The Open Journal},
volume = {6},
number = {68},
pages = {3958},
author = {Mathieu Bernard and Hadrien Titeux},
title = {Phonemizer: Text to Phones Transcription for Multiple Languages in Python},
journal = {Journal of Open Source Software}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bpc-0.1.2.tar.gz
(29.8 kB
view details)
File details
Details for the file bpc-0.1.2.tar.gz
.
File metadata
- Download URL: bpc-0.1.2.tar.gz
- Upload date:
- Size: 29.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3656481702abe70b46ca549356cdb2261383969e6eeea9161b6df13a1a2fb14a |
|
MD5 | 7982949004616fdad585b8b34ce41d0c |
|
BLAKE2b-256 | 1e88ebe224459d5807b6cb0d7c2e72b98f8e26f06af35c0b24166e93b0fbfea0 |