A Python package for transliterating English text to Thai using a ByT5 model
Project description
En2Th Transliterator
A Python package for transliterating English text to Thai using a ByT5 model.
Features
- Byte-level processing: More robust against spelling variations
- Beam search & sampling: Allows fine-tuning of output quality
- Batch processing: Efficient for large-scale transliteration
- Mixed precision (FP16): Faster inference on compatible GPUs
- Command-line interface: Easy to use from the terminal
- Hugging Face integration: Automatically downloads and caches the model
Installation
You can install the package via pip:
pip install en2th-transliterator
Usage
As a Python Package
Basic Usage
from en2th_transliterator import En2ThTransliterator
# Initialize with the default model
model = En2ThTransliterator()
# Transliterate a single text
thai_text = model.transliterate("hello")
print(f"Thai: {thai_text}")
Advanced Usage
from en2th_transliterator import En2ThTransliterator
# Initialize with custom parameters
model = En2ThTransliterator(
model_path=None, # Use default HF model
max_length=50,
num_beams=5,
length_penalty=1.5,
verbose=True,
fp16=True # Enable mixed precision
)
# Transliterate using sampling
thai_text = model.transliterate(
"artificial intelligence",
temperature=0.8,
top_k=40,
top_p=0.95
)
print(f"Thai: {thai_text}")
# Batch transliteration
english_texts = ["computer", "keyboard", "mouse", "monitor"]
thai_texts = model.batch_transliterate(
english_texts,
batch_size=2,
temperature=0.5
)
for eng, thai in zip(english_texts, thai_texts):
print(f"{eng} → {thai}")
Command Line Interface
Basic Usage
en2th-transliterate --text "hello"
Transliterate from a File
en2th-transliterate --file input.txt --output results.txt
Output in JSON Format
en2th-transliterate --file input.txt --format json --output results.json
Output in TSV Format
en2th-transliterate --file input.txt --format tsv --output results.tsv
Using Custom Parameters
en2th-transliterate --text "hello" --fp16 --temperature 0.7 --num-beams 5
Model
The package utilizes a ByT5 model fine-tuned on English-to-Thai transliteration data. The model operates at the byte level, making it effective for handling various input variations and generating Thai text with high accuracy.
This package uses the yacht/byt5-base-en2th-transliterator model from Hugging Face Hub.
Performance Optimization
FP16 Mixed Precision
The package supports FP16 mixed precision for faster inference on compatible GPUs. This is enabled by default but can be disabled if needed:
model = En2ThTransliterator(fp16=False)
Or from the command line:
en2th-transliterate --text "hello" --no-fp16
Batch Processing
For transliterating multiple texts, batch processing is more efficient:
texts = ["hello", "world", "computer", "science"]
results = model.batch_transliterate(texts, batch_size=4)
Development
Setting Up Development Environment
# Clone the repository
git clone https://github.com/tchayintr/en2th-transliterator.git
cd en2th-transliterator
# Install in development mode
pip install -e .
Running Tests
# Create a test script
python test_package.py
Building the Package
# Install build tools
pip install build twine
# Build the package
python -m build
# Upload to PyPI
python -m twine upload dist/*
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this package in your research, please cite:
@software{en2th_transliterator,
author = {Thodsaporn Chay-intr},
title = {En2Th Transliterator: English to Thai Transliteration using ByT5},
year = {2025},
url = {https://github.com/tchayintr/en2th-transliterator}
}
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file en2th_transliterator-0.1.0.tar.gz.
File metadata
- Download URL: en2th_transliterator-0.1.0.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45960156448c541d6a7fc242213e07b5c48d6185bd0e9aba92e0ca0f5c8f36e5
|
|
| MD5 |
a6cfc5265089baa7c4d7ea4764dfba30
|
|
| BLAKE2b-256 |
6eea1f6715bb87d984cc9e42229a09783e97e290c014287713f2bd7cc42036a8
|
File details
Details for the file en2th_transliterator-0.1.0-py3-none-any.whl.
File metadata
- Download URL: en2th_transliterator-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c073e1cdfe88b42f07fabc47e7c07158b6a1f6e0579774dee54868eac315ef0a
|
|
| MD5 |
514a36fe5baee17bb54d1e9ad89b4c10
|
|
| BLAKE2b-256 |
9cd2e80698c18677c8ba66cd45ead8a1cc1d8718a9c61eed3ab098c96ae13159
|