Skip to main content

Burmese NLP Tools

Project description

Burmese-Tools

burmese-tools is a Python library that mainly designed for transformation between unicode and zawgyi burmese lanuage script. Moreover, it also support syllable tokenization for burmese unicode and parital syllable tokenization for burmese zawgyi.

Features

  • Zawgyi to Unicode Conversion: Convert Zawgyi-encoded text to Unicode.
  • Unicode to Zawgyi Conversion: Transform Unicode text back to Zawgyi.
  • Unicode Syllable Tokenization: Tokenize Burmese Unicode text into syllables with customizable splitting.
  • Partial Zawgyi Syllable Tokenization: Tokenize Burmese Zawgyi text into syllables.

Installation

You can install this library from PyPI:

pip install burmese-tools

Usage

Importing the Library

from burmese_tools import tools

Convert Zawgyi to Unicode

text_zawgyi = "ကႏၲာရ"
converted_text = tools.zaw2uni(text_zawgyi)
print(converted_text)  # Output: ကန္တာရ

Convert Unicode to Zawgyi

text_unicode = "ကန္တာရ"
converted_text = tools.uni2zaw(text_unicode)
print(converted_text)  # Output: ကႏၲာ႐

Tokenize Unicode Text into Syllables

The uni_syllable function is a utility to tokenize Unicode text into syllable tokens, allowing for flexible splitting methods.

Features

  • Tokenizes Unicode text into syllables.
  • Provides two types of splitting:
    • Type 1: Splits ဂန္ဓာရ into ['ဂ', 'န္ဓာ', 'ရ']. (default)
    • Type 2: Splits ဂန္ဓာရ into ['ဂန္', 'ဓာ', 'ရ'].
  • Supports an optional transform to replace with (applies only when type=2) d
    • default = True
text = "ကန္တာရ"
tokens = tools.uni_syllable(text, type=1)
print(tokens)  # Output: ['က', 'န္တာ', 'ရ']
text = "ကန္တာရ"
tokens = tools.uni_syllable(text, type=2)
print(tokens)  # Output: ['ကန်', 'တာ', 'ရ']
text = "ကန္တာရ"
tokens = tools.uni_syllable(text, type=2, transform=False)
print(tokens)  # Output: ['ကန္', 'တာ', 'ရ']

Tokenize Zawgyi Text into Partial Syllables

text = "ကႏၲာရ"
tokens = tools.zaw_partial_syllable(text)
print(tokens)  # Output: ['က', 'ႏၲာ', 'ရ']  in unicode ['က', 'န္တာ', 'ရ'] 

Contributing

Contributions are welcome! Please follow these steps:

  • Fork the repository.
  • Create a new branch for your feature/bug fix.
  • Make your changes and test thoroughly.
  • Submit a pull request.

License

This library is licensed under the MIT License. Feel free to use, modify, and distribute it.

Acknowledgments

This library was developed to simplify Burmese text processing for developers and linguists. Special thanks to Sa Phyo Thu Thet, from Simbolo for his invaluable guidance, kindness, and support in teaching me. His mentorship has been instrumental in shaping my understanding and skills.

Contributions and feedback from the community are also highly appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

burmese_tools-0.1.2.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

burmese_tools-0.1.2-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file burmese_tools-0.1.2.tar.gz.

File metadata

  • Download URL: burmese_tools-0.1.2.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for burmese_tools-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b8b5055d01aae7e4db226d285f86cfa8a576f33fb2681ef28f5b6c1a77dbd636
MD5 30a10162df0d1fc94b2a9c997d7e8bb9
BLAKE2b-256 a867cfc3257d5530bf5c2d3186bb34e43536e6cdcffc38ed79f37f760de06cfa

See more details on using hashes here.

File details

Details for the file burmese_tools-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: burmese_tools-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for burmese_tools-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 158d031e2ffeb777c954352cc42cc04bbb4955baf649e8a3e47a6cc10021e436
MD5 c1722f9a351fdb09fe13263792b0af8c
BLAKE2b-256 96ca2da5d0f78ec19c6f8c49d900ee77f7e84fcb504724bc8fb3c527a59d7151

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page