Skip to main content

No project description provided

Project description

Here's an extended version of your text with additional information about Burmese-to-Romanization:

Provides a tool for syllable-based tokenization of Burmese text. It breaks down Burmese text into individual syllables, facilitating language processing tasks such as text analysis, machine learning, and natural language processing (NLP) for Burmese.

Features

Syllable Tokenization: Tokenizes Burmese text into syllables based on Unicode rules. It helps in language segmentation and provides a clear framework for analyzing Burmese sentences in a structured manner.

Efficient Processing: Designed to handle large text efficiently with minimal memory overhead, making it scalable for tasks involving big data or large-scale text analysis.

Burmese Unicode Support: Fully supports Burmese script and syllable rules as defined by the Burmese Unicode standard, ensuring that the tokenization aligns with native Burmese text structure.

Burmese-to-Romanization: Converts Burmese script into its Romanized equivalent, facilitating pronunciation guidance and helping non-native speakers understand Burmese text. The Romanization process follows the standard linguistic rules for Burmese phonetic transcription, offering a bridge for users unfamiliar with the Burmese script to read, pronounce, and comprehend the language. This feature can be particularly useful for language learners, cross-lingual applications, and linguistic studies that require Romanized Burmese text.

How to use (Getting Started)

# Install the SimboloSiamese package using pip
# pip install SimboloSiamese

# Import the BurmeseConverter from the Siamese module
from Siamese import BurmeseConverter

converter = BurmeseConverter()

# Example: Zawgyi to Unicode
zawgyi_text = "ဖြွှော်"
try:
    # Convert Zawgyi text to Unicode
    unicode_output = converter.zawgyi_to_unicode(zawgyi_text)
    # Print the Unicode output
    print("Unicode Output:", unicode_output)
except Exception as e:
    # Handle any errors that occur during conversion
    print(f"Error in Zawgyi to Unicode conversion: {e}")

# Example: Tokenization of a Burmese word
tokenization_text = "တက္ကသိုလ်"
try:
    # Tokenize the Burmese word. 1 means With the virama mark. If you dont want to tokenize the virama mark, you can type any numbers except 1
    tokenized_output = converter.syllable_tokenization(1, tokenization_text) # try with process_text in case it cannot work with syllable_tokenization
    print("Tokenized Output:", tokenized_output)
except Exception as e:
    # Handle any errors that occur during tokenization
    print(f"Cannot Tokenize the word: {e}")

# Example: Convert Burmese text to Romanized script
burmese_text = "ကော်"
try:
    # Convert Burmese text to Romanized script
    romanized_output = converter.burmese_to_romanization(burmese_text)
    # Print the Romanized output
    print("Romanized Output:", romanized_output)
except Exception as e:
    # Handle any errors that occur during Romanization
    print(f"Error in Burmese Romanization: {e}")

# Example: Romanization Burmese
burmese_text = "le kReAc: liuc:, K rI: sq a mHt, ၂ ၂ ၈ ၃, jQ, SeAF piu liu mRiu., lU ne rp kWk peAF jiu., pYk kY KL. pRI:, liuk pA lA jU, ၆ ၂, OO: s luN:, je SuN: KL. jQ // "
try:
    burmese_output = converter.romanization_to_burmese(burmese_text)
    print("Burmese Output:", burmese_output)
except Exception as e:
    print(f"Error in Romanization Burmese: {e}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simbolosiamese-0.1.9.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

SimboloSiamese-0.1.9-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file simbolosiamese-0.1.9.tar.gz.

File metadata

  • Download URL: simbolosiamese-0.1.9.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for simbolosiamese-0.1.9.tar.gz
Algorithm Hash digest
SHA256 7c363b455178a9821d36a0b403135a8268b57ed78f3fe64f870fedc162d3f979
MD5 b6264a11ea37d7b17877c069861adddd
BLAKE2b-256 a5c4efda8de869b35648923ac52cbf812c8e3e7d69c691d92fcff447e195361e

See more details on using hashes here.

File details

Details for the file SimboloSiamese-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for SimboloSiamese-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 801d1f4d8c4d07ff075e158f51c72a1a54b7cbc2f6343ebceaadd4089c70e5b6
MD5 ad51355d9f044508caf546d04bfd211e
BLAKE2b-256 71858d74d4d3b4500aeb7d9263aeaf7d038b2d52280d705610bc3e8f723c2a4e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page