Skip to main content

No project description provided

Project description

Here's an extended version of your text with additional information about Burmese-to-Romanization:

Provides a tool for syllable-based tokenization of Burmese text. It breaks down Burmese text into individual syllables, facilitating language processing tasks such as text analysis, machine learning, and natural language processing (NLP) for Burmese.

Features

Syllable Tokenization: Tokenizes Burmese text into syllables based on Unicode rules. It helps in language segmentation and provides a clear framework for analyzing Burmese sentences in a structured manner.

Efficient Processing: Designed to handle large text efficiently with minimal memory overhead, making it scalable for tasks involving big data or large-scale text analysis.

Burmese Unicode Support: Fully supports Burmese script and syllable rules as defined by the Burmese Unicode standard, ensuring that the tokenization aligns with native Burmese text structure.

Burmese-to-Romanization: Converts Burmese script into its Romanized equivalent, facilitating pronunciation guidance and helping non-native speakers understand Burmese text. The Romanization process follows the standard linguistic rules for Burmese phonetic transcription, offering a bridge for users unfamiliar with the Burmese script to read, pronounce, and comprehend the language. This feature can be particularly useful for language learners, cross-lingual applications, and linguistic studies that require Romanized Burmese text.

How to use (Getting Started)

# Install the SimboloSiamese package using pip
# pip install SimboloSiamese

# Import the BurmeseConverter from the Siamese module
from Siamese import BurmeseConverter

converter = BurmeseConverter()

# Example: Zawgyi to Unicode
zawgyi_text = "ဖြွှော်"
try:
    # Convert Zawgyi text to Unicode
    unicode_output = converter.zawgyi_to_unicode(zawgyi_text)
    # Print the Unicode output
    print("Unicode Output:", unicode_output)
except Exception as e:
    # Handle any errors that occur during conversion
    print(f"Error in Zawgyi to Unicode conversion: {e}")

# Example: Tokenization of a Burmese word
tokenization_text = "တက္ကသိုလ်"
try:
    # Tokenize the Burmese word. 1 means With the virama mark. If you dont want to tokenize the virama mark, you can type any numbers except 1
    tokenized_output = converter.syllable_tokenization(1, tokenization_text) # try with process_text in case it cannot work with syllable_tokenization
    print("Tokenized Output:", tokenized_output)
except Exception as e:
    # Handle any errors that occur during tokenization
    print(f"Cannot Tokenize the word: {e}")

# Example: Convert Burmese text to Romanized script
burmese_text = "ကော်"
try:
    # Convert Burmese text to Romanized script
    romanized_output = converter.burmese_to_romanization(burmese_text)
    # Print the Romanized output
    print("Romanized Output:", romanized_output)
except Exception as e:
    # Handle any errors that occur during Romanization
    print(f"Error in Burmese Romanization: {e}")

# Example: Romanization Burmese
burmese_text = "le kReAc: liuc:, K rI: sq a mHt, ၂ ၂ ၈ ၃, jQ, SeAF piu liu mRiu., lU ne rp kWk peAF jiu., pYk kY KL. pRI:, liuk pA lA jU, ၆ ၂, OO: s luN:, je SuN: KL. jQ // "
try:
    burmese_output = converter.romanization_to_burmese(burmese_text)
    print("Burmese Output:", burmese_output)
except Exception as e:
    print(f"Error in Romanization Burmese: {e}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simbolosiamese-0.1.4.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

SimboloSiamese-0.1.4-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file simbolosiamese-0.1.4.tar.gz.

File metadata

  • Download URL: simbolosiamese-0.1.4.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for simbolosiamese-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1dbe5902af9091318d298e13652ca236c38400bec6c10c8b14a6b8458432357e
MD5 842b52fe39154ab89be2b611bda8aec0
BLAKE2b-256 054a72281effbb03efc62fede8c037598210381e8c29058a78a70cfa39ece796

See more details on using hashes here.

File details

Details for the file SimboloSiamese-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for SimboloSiamese-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ad1ab411d4ef913601342add2a22ec3bd6b33bc4c40bcf28941709b713b3e28b
MD5 74f6656fb3527acc20bf87e846301664
BLAKE2b-256 4d9922251c5d8dae8bcef1e17c96f16f34ba71e856c96b54ca461a28521d6d5e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page