Python library for Pyidaungsu Myanmar languages

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Pyidaungsu

Python library for Myanmar language. Useful in Natural Language Processing and text preprocessing for Myanmar language.

Installation

pip install pyidaungsu

Usage

Zawgyi-Unicode detection Language detection (Myanmar <Zawgyi, Unicode>, Karen, Mon, Shan)

Starting from the pyidaungsu 0.0.9, it does not only detect Zawgyi and Unicode for Myanmar language but also other languages such as Mon, Karen, Shan as well.

import pyidaungsu as pds

# language detection
pds.detect("ထမင်းစားပြီးပြီလား")
>> "mm_uni"
pds.detect("ထမင္းစားၿပီးၿပီလား")
>> "mm_zg"
pds.detect("တၢ်သိၣ်လိတၢ်ဖးလံာ် ကွဲးလံာ်အိၣ်လၢ မ့ရ့ၣ်အစုပူၤလီၤ.")
>> "karen"
pds.detect("ဇၟာပ်မၞိဟ်ဂှ် ကတဵုဒှ်ကၠုင် ပ္ဍဲကဵုဂကောံမွဲ ဖအိုတ်ရ၊၊")
>> "mon"
pds.detect("ၼႂ်းဢိူင်ႇမိူင်းၽူင်း ၸႄႈဝဵင်းတႃႈၶီႈလဵၵ်း ၾႆးမႆႈႁိူၼ်း ၵူၼ်းဝၢၼ်ႈ လင်ၼိုင်ႈ")
>> "shan"

Zawgyi-Unicode conversion

# convert to zawgyi
pds.cvt2zgi("ထမင်းစားပြီးပြီလား")
>> "ထမင္းစားၿပီးၿပီလား"

# convert to unicode
pds.cvt2uni("ထမင္းစားၿပီးၿပီလား")
>> "ထမင်းစားပြီးပြီလား"

Tokenization

# syllable level tokenization for Burmese
pds.tokenize("Alan TuringကိုArtificial Intelligenceနဲ့Computerတွေရဲ့ဖခင်ဆိုပြီးလူသိများပါတယ်") # lang parameter for default function is 'mm'
>> ['Alan', 'Turing', 'ကို', 'Artificial', 'Intelligence', 'နဲ့', 'Computer', 'တွေ', 'ရဲ့', 'ဖ', 'ခင်', 'ဆို', 'ပြီး', 'လူ', 'သိ', 'များ', 'ပါ', 'တယ်']

# syllable level tokenization for Karen
pds.tokenize("သရၣ်,သရၣ်မုၣ် ခဲလၢာ်ဟးထီၣ် (၃၅) ဂၤန့ၣ်လီၤ.", lang="karen")
>> ['ကၠိ', 'သ', 'ရၣ်', ',', 'သ', 'ရၣ်', 'မုၣ်', 'ခဲ', 'လၢာ်', 'ဟး', 'ထီၣ်', '(', '၃၅', ')', 'ဂၤ', 'န့ၣ်', 'လီၤ', '.']

# word level tokenization
pds.tokenize("ဖေဖေနဲ့မေမေ၏ကျေးဇူးတရားမှာကြီးမားလှပေသည်", form="word")
>> ['ဖေဖေ', 'နဲ့', 'မေမေ', '၏', 'ကျေးဇူးတရား', 'မှာ', 'ကြီးမား', 'လှ', 'ပေ', 'သည်']

Syllable-level tokenization supports for 4 languages (Burmese, Karen, Shan, Mon). Word-level tokenization supports only Burmese currently.
Available values for lang parameter in tokenize function: "mm", "karen", "mon", "shan"

Future work

Add tokenizer for Burmese (Syllabel and word-level tokenization)
Add more tokenizer (BPE, WordPiece etc.)
Add Part-of-Speech (POS) tagger for Burmese
Add Named-entities Recognition (NER) classifier for Burmese
Add thorough documentation

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.4

Jul 15, 2021

0.1.3

Jul 13, 2021

0.1.2

Jul 11, 2021

0.1.1

Jul 11, 2021

0.0.9

Jul 8, 2020

0.0.8

May 5, 2020

0.0.7

Apr 15, 2020

0.0.6

Apr 12, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyidaungsu-0.1.4.tar.gz (5.5 MB view details)

Uploaded Jul 15, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyidaungsu-0.1.4-py3-none-any.whl (5.5 MB view details)

Uploaded Jul 15, 2021 Python 3

File details

Details for the file pyidaungsu-0.1.4.tar.gz.

File metadata

Download URL: pyidaungsu-0.1.4.tar.gz
Upload date: Jul 15, 2021
Size: 5.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.10

File hashes

Hashes for pyidaungsu-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`15b91d0cbfee85c30aa71fda02b968c6a83fff3558fb38ea9c5c31ce9e3d0c7d`
MD5	`8c28af42c1a828407d9c0eb255b5ea9c`
BLAKE2b-256	`f71db720f0f4e84e923751b13a7a42657eb800be5bc5ef830be6812c7af47a80`

See more details on using hashes here.

File details

Details for the file pyidaungsu-0.1.4-py3-none-any.whl.

File metadata

Download URL: pyidaungsu-0.1.4-py3-none-any.whl
Upload date: Jul 15, 2021
Size: 5.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.10

File hashes

Hashes for pyidaungsu-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f9f07912ecf33bfadc5a8b3265e0a6757221047476745cdd4a7f66db03aeef9a`
MD5	`f80637cb72af895d40e5b78b7388d64e`
BLAKE2b-256	`cda9596a86adb1d388f0748f5a982daacbe2430e89a2caf7fccfa1ac767011f0`

See more details on using hashes here.

pyidaungsu 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pyidaungsu

Installation

Usage

Zawgyi-Unicode detection Language detection (Myanmar <Zawgyi, Unicode>, Karen, Mon, Shan)

Zawgyi-Unicode conversion

Tokenization

Future work

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes