A Python package for transliterating English text to Khmer script.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Khmer Text Transliteration

A Python-based system for transliterating English text to Khmer script using sequence-to-sequence neural networks.

Overview

This project provides tools to convert English phonetic text into Khmer script. It uses a sequence-to-sequence model with LSTM layers for transliteration.

Features

English to Khmer text transliteration
Multiple prediction variants
Fuzzy matching and similarity search
Web interface using Gradio

Project Structure

Pre-trained Models

The project includes pre-trained models located in khmer_text_transliteration/models/pretrained/:

khmer_transliterator.keras: A pre-trained sequence-to-sequence model for English to Khmer transliteration

Training Assets

Tokenizer and model assets are stored in data/processed/:

khmer_transliteration_assets.pkl: Contains the English and Khmer tokenizers, along with sequence length information

Training Data

Raw data for training and reference is available in data/raw/:

eng_khm_data.txt: Training data with English-Khmer word pairs
khmer_words.txt: Dictionary of Khmer words
1000-most-common-khmer-words/: Collection of common Khmer words for reference

Training Process

The model training process is documented in the notebooks:

notebooks/khmer_seq2seq.ipynb: Jupyter notebook containing the complete training pipeline, including:
- Data preprocessing
- Model architecture
- Training configuration
- Evaluation metrics
- Example predictions

To train a new model or experiment with the existing one, refer to the training notebook for detailed instructions and parameters.

Core Functions

1. Basic Transliteration

from khmer_text_transliteration.predict import transliterate

# Convert English text to Khmer
result = transliterate("somlor")  # Returns: សម្ល

2. Generate Multiple Variants

from khmer_text_transliteration.predict import transliterate_variants

# Get multiple possible transliterations
variants = transliterate_variants("srolanh", num_variants=3, temperature=0.7)
# Returns: ['ស្រឡាញ់', 'ស្រលាញ', 'ស្រលាញ់']

3. Find Similar Words

from khmer_text_transliteration.predict_with_clean import find_similar

# Find similar Khmer words
similar_words = find_similar("min", max_results=2)
# Returns: ['មិន', 'មីន']

4. TF-IDF Based Similarity Search

from khmer_text_transliteration.predict_with_clean import find_similar_tfidf

# Find similar words using TF-IDF
similar = find_similar_tfidf("min", max_results=2)
# Returns: ['មិន', 'មីន']

5. Last Result Prediction

from khmer_text_transliteration.predict_with_clean import predict_last_result

# Get final predictions with scoring
results = predict_last_result("snam", num_results=3)
# Returns: ['ស្នាម', 'ស្នំ', 'សម្នាម']

Requirements

TensorFlow 2.x
NumPy
scikit-learn
python-Levenshtein
rapidfuzz
gradio (for web interface)

Installation

pip install -r requirements.txt

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.4

Feb 17, 2025

0.1.3

Feb 17, 2025

0.1.2

Feb 17, 2025

0.1.1

Feb 17, 2025

0.1.0

Feb 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khmer-english-transliteration-0.1.4.tar.gz (1.7 MB view details)

Uploaded Feb 17, 2025 Source

Built Distribution

khmer_english_transliteration-0.1.4-py3-none-any.whl (1.7 MB view details)

Uploaded Feb 17, 2025 Python 3

File details

Details for the file khmer-english-transliteration-0.1.4.tar.gz.

File metadata

Download URL: khmer-english-transliteration-0.1.4.tar.gz
Upload date: Feb 17, 2025
Size: 1.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for khmer-english-transliteration-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`2e2cadf5eec442fbb5cd9945fe1e37b8bf3ccecd7c87c99b04fa798868a8d969`
MD5	`5a6e30e2ce31dc09ab3da0d0df36e50d`
BLAKE2b-256	`2f9167350d9e5b0755c62a86eb84b32b663a4c5104ca8dbc52da40a93454076f`

See more details on using hashes here.

File details

Details for the file khmer_english_transliteration-0.1.4-py3-none-any.whl.

File metadata

Download URL: khmer_english_transliteration-0.1.4-py3-none-any.whl
Upload date: Feb 17, 2025
Size: 1.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for khmer_english_transliteration-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`631c34c8b5d081f1aa6261786f6b6e2116a6770879d39eca8e7c886483cf577d`
MD5	`42eb0a8fb9ce1cda6618e49f72fa0920`
BLAKE2b-256	`439cf04bad7bef8525f43a56a6b79f3ae065748a68dfe0252e8d2217d60c8c55`

See more details on using hashes here.

khmer-english-transliteration 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Khmer Text Transliteration

Overview

Features

Project Structure

Pre-trained Models

Training Assets

Training Data

Training Process

Core Functions

1. Basic Transliteration

2. Generate Multiple Variants

3. Find Similar Words

4. TF-IDF Based Similarity Search

5. Last Result Prediction

Requirements

Installation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes