Skip to main content

No project description provided

Project description

Here's an extended version of your text with additional information about Burmese-to-Romanization:

Provides a tool for syllable-based tokenization of Burmese text. It breaks down Burmese text into individual syllables, facilitating language processing tasks such as text analysis, machine learning, and natural language processing (NLP) for Burmese.

Features

Syllable Tokenization: Tokenizes Burmese text into syllables based on Unicode rules. It helps in language segmentation and provides a clear framework for analyzing Burmese sentences in a structured manner.

Efficient Processing: Designed to handle large text efficiently with minimal memory overhead, making it scalable for tasks involving big data or large-scale text analysis.

Burmese Unicode Support: Fully supports Burmese script and syllable rules as defined by the Burmese Unicode standard, ensuring that the tokenization aligns with native Burmese text structure.

Burmese-to-Romanization: Converts Burmese script into its Romanized equivalent, facilitating pronunciation guidance and helping non-native speakers understand Burmese text. The Romanization process follows the standard linguistic rules for Burmese phonetic transcription, offering a bridge for users unfamiliar with the Burmese script to read, pronounce, and comprehend the language. This feature can be particularly useful for language learners, cross-lingual applications, and linguistic studies that require Romanized Burmese text.

How to use (Getting Started)

# Install the SimboloSiamese package using pip
# pip install SimboloSiamese

# Import the BurmeseConverter from the Siamese module
from Siamese import BurmeseConverter

converter = BurmeseConverter()

# Example: Zawgyi to Unicode
zawgyi_text = "ဖြွှော်"
try:
    # Convert Zawgyi text to Unicode
    unicode_output = converter.zawgyi_to_unicode(zawgyi_text)
    # Print the Unicode output
    print("Unicode Output:", unicode_output)
except Exception as e:
    # Handle any errors that occur during conversion
    print(f"Error in Zawgyi to Unicode conversion: {e}")

# Example: Tokenization of a Burmese word
tokenization_text = "တက္ကသိုလ်"
try:
    # Tokenize the Burmese word. 1 means With the virama mark. If you dont want to tokenize the virama mark, you can type any numbers except 1
    tokenized_output = converter.syllable_tokenization(1, tokenization_text) # try with process_text in case it cannot work with syllable_tokenization
    print("Tokenized Output:", tokenized_output)
except Exception as e:
    # Handle any errors that occur during tokenization
    print(f"Cannot Tokenize the word: {e}")

# Example: Convert Burmese text to Romanized script
burmese_text = "ကော်"
try:
    # Convert Burmese text to Romanized script
    romanized_output = converter.burmese_to_romanization(burmese_text)
    # Print the Romanized output
    print("Romanized Output:", romanized_output)
except Exception as e:
    # Handle any errors that occur during Romanization
    print(f"Error in Burmese Romanization: {e}")

# Example: Romanization Burmese
burmese_text = "le kReAc: liuc:, K rI: sq a mHt, ၂ ၂ ၈ ၃, jQ, SeAF piu liu mRiu., lU ne rp kWk peAF jiu., pYk kY KL. pRI:, liuk pA lA jU, ၆ ၂, OO: s luN:, je SuN: KL. jQ // "
try:
    burmese_output = converter.romanization_to_burmese(burmese_text)
    print("Burmese Output:", burmese_output)
except Exception as e:
    print(f"Error in Romanization Burmese: {e}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simbolosiamese-0.1.7.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

SimboloSiamese-0.1.7-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file simbolosiamese-0.1.7.tar.gz.

File metadata

  • Download URL: simbolosiamese-0.1.7.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for simbolosiamese-0.1.7.tar.gz
Algorithm Hash digest
SHA256 6d0f42f5c42e38be4bdbef1252783bc2ab144bd5d6fd71955b70aa4603242176
MD5 ee026aeeac53bc08082eaa9cd6524a58
BLAKE2b-256 c202a3d11c13f2381f93ce82037c1882c865cade9527871a7f4d6b22f69d3611

See more details on using hashes here.

File details

Details for the file SimboloSiamese-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for SimboloSiamese-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5d244e9281c8f85983ea5d28531911e2ea6df792c78db3d3bbe0ab84ba7b47b6
MD5 47c75910f484d7f4c4b8bfc83bf116bd
BLAKE2b-256 6d3218697a535743bb633c24d03e27a83a7b24ca1468eaa5d09aa6fe355e259f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page