Skip to main content

No project description provided

Project description

Here's an extended version of your text with additional information about Burmese-to-Romanization:

Provides a tool for syllable-based tokenization of Burmese text. It breaks down Burmese text into individual syllables, facilitating language processing tasks such as text analysis, machine learning, and natural language processing (NLP) for Burmese.

Features

Syllable Tokenization: Tokenizes Burmese text into syllables based on Unicode rules. It helps in language segmentation and provides a clear framework for analyzing Burmese sentences in a structured manner.

Efficient Processing: Designed to handle large text efficiently with minimal memory overhead, making it scalable for tasks involving big data or large-scale text analysis.

Burmese Unicode Support: Fully supports Burmese script and syllable rules as defined by the Burmese Unicode standard, ensuring that the tokenization aligns with native Burmese text structure.

Burmese-to-Romanization: Converts Burmese script into its Romanized equivalent, facilitating pronunciation guidance and helping non-native speakers understand Burmese text. The Romanization process follows the standard linguistic rules for Burmese phonetic transcription, offering a bridge for users unfamiliar with the Burmese script to read, pronounce, and comprehend the language. This feature can be particularly useful for language learners, cross-lingual applications, and linguistic studies that require Romanized Burmese text.

How to use (Getting Started)

# Install the SimboloSiamese package using pip
# pip install SimboloSiamese

# Import the BurmeseConverter from the Siamese module
from Siamese import BurmeseConverter

converter = BurmeseConverter()

# Example: Zawgyi to Unicode
zawgyi_text = "ဖြွှော်"
try:
    # Convert Zawgyi text to Unicode
    unicode_output = converter.zawgyi_to_unicode(zawgyi_text)
    # Print the Unicode output
    print("Unicode Output:", unicode_output)
except Exception as e:
    # Handle any errors that occur during conversion
    print(f"Error in Zawgyi to Unicode conversion: {e}")

# Example: Tokenization of a Burmese word
tokenization_text = "တက္ကသိုလ်"
try:
    # Tokenize the Burmese word. 1 means With the virama mark. If you dont want to tokenize the virama mark, you can type any numbers except 1
    tokenized_output = converter.syllable_tokenization(1, tokenization_text) # try with process_text in case it cannot work with syllable_tokenization
    print("Tokenized Output:", tokenized_output)
except Exception as e:
    # Handle any errors that occur during tokenization
    print(f"Cannot Tokenize the word: {e}")

# Example: Convert Burmese text to Romanized script
burmese_text = "ကော်"
try:
    # Convert Burmese text to Romanized script
    romanized_output = converter.burmese_to_romanization(burmese_text)
    # Print the Romanized output
    print("Romanized Output:", romanized_output)
except Exception as e:
    # Handle any errors that occur during Romanization
    print(f"Error in Burmese Romanization: {e}")

# Example: Romanization Burmese
burmese_text = "le kReAc: liuc:, K rI: sq a mHt, ၂ ၂ ၈ ၃, jQ, SeAF piu liu mRiu., lU ne rp kWk peAF jiu., pYk kY KL. pRI:, liuk pA lA jU, ၆ ၂, OO: s luN:, je SuN: KL. jQ // "
try:
    burmese_output = converter.romanization_to_burmese(burmese_text)
    print("Burmese Output:", burmese_output)
except Exception as e:
    print(f"Error in Romanization Burmese: {e}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simbolosiamese-0.1.8.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

SimboloSiamese-0.1.8-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file simbolosiamese-0.1.8.tar.gz.

File metadata

  • Download URL: simbolosiamese-0.1.8.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for simbolosiamese-0.1.8.tar.gz
Algorithm Hash digest
SHA256 4c843316b7a741ef2d69e10fee84dc84f7852cf53cd221400c79a89ae2a2661b
MD5 ed73692a470f5bf99115eea640f67a63
BLAKE2b-256 053d07d0617f7b5e6cdbe211bdfeda40c644ff1b414d48ddd84cf669f04317f5

See more details on using hashes here.

File details

Details for the file SimboloSiamese-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for SimboloSiamese-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 5bb18b5366c03d5826dd8736b0925478e6e3110b95d6a70df8c3b4db8fe0d409
MD5 a8bb26d5dfc40860591788468f5f17c7
BLAKE2b-256 6fea87efe76dbc91853c941db1d6cb66e8ba0705c34e20de0ac539f721527a6d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page