A high-performance translation library using CTTranslate2 and vLLM.
Project description
Faster Translate
A high-performance translation library powered by state-of-the-art models. Faster Translate offers optimized inference using CTranslate2 and vLLM backends, providing an easy-to-use interface for applications requiring efficient and accurate translations.
🚀 Features
- High-performance inference using CTranslate2 and vLLM backends
- Seamless integration with Hugging Face models
- Flexible API for single sentence, batch, and large-scale translation
- Dataset translation with direct Hugging Face integration
- Multi-backend support for both traditional (CTranslate2) and LLM-based (vLLM) models
- Text normalization for improved translation quality
📦 Installation
pip install faster-translate
Optional Dependencies
For specific normalizers or model backends:
# For Bengali text normalization
pip install git+https://github.com/csebuetnlp/normalizer
# For vLLM backend support (required for LLM-based models)
pip install vllm
🔍 Usage
Basic Translation
from faster_translate import TranslatorModel
# Initialize with a pre-configured model
translator = TranslatorModel.from_pretrained("banglanmt_bn2en")
# Translate a single sentence
english_text = translator.translate_single("দেশে বিদেশি ঋণ নিয়ে এখন বেশ আলোচনা হচ্ছে।")
print(english_text)
# Translate a batch of sentences
bengali_sentences = [
"দেশে বিদেশি ঋণ নিয়ে এখন বেশ আলোচনা হচ্ছে।",
"রাত তিনটার দিকে কাঁচামাল নিয়ে গুলিস্তান থেকে পুরান ঢাকার শ্যামবাজারের আড়তে যাচ্ছিলেন লিটন ব্যাপারী।"
]
translations = translator.translate_batch(bengali_sentences)
Using Different Model Backends
# Using a CTTranslate2-based model
ct2_translator = TranslatorModel.from_pretrained("banglanmt_bn2en")
# Using a vLLM-based model
vllm_translator = TranslatorModel.from_pretrained("bangla_qwen_en2bn")
Loading Models from Hugging Face
# Load a specific model from Hugging Face
translator = TranslatorModel.from_pretrained(
"sawradip/faster-translate-banglanmt-bn2en-t5",
normalizer_func="buetnlpnormalizer"
)
Translating Hugging Face Datasets
Translate an entire dataset with a single function call:
translator = TranslatorModel.from_pretrained("banglanmt_en2bn")
# Translate the entire dataset
translator.translate_hf_dataset(
"sawradip/bn-translation-mega-raw-noisy",
batch_size=16
)
# Translate specific subsets
translator.translate_hf_dataset(
"sawradip/bn-translation-mega-raw-noisy",
subset_name=["google"],
batch_size=16
)
# Translate a portion of the dataset
translator.translate_hf_dataset(
"sawradip/bn-translation-mega-raw-noisy",
subset_name="alt",
batch_size=16,
translation_size=0.5 # Translate 50% of the dataset
)
Publishing Translated Datasets
Push translated datasets directly to Hugging Face:
translator.translate_hf_dataset(
"sawradip/bn-translation-mega-raw-noisy",
subset_name="alt",
batch_size=16,
push_to_hub=True,
token="your_huggingface_token",
save_repo_name="your-username/translated-dataset"
)
🌐 Supported Models
Model ID | Source Language | Target Language | Backend | Description |
---|---|---|---|---|
banglanmt_bn2en |
Bengali | English | CTranslate2 | BanglaNMT model from BUET |
banglanmt_en2bn |
English | Bengali | CTranslate2 | BanglaNMT model from BUET |
bangla_mbartv1_en2bn |
English | Bengali | CTranslate2 | MBart-based translation model |
bangla_qwen_en2bn |
English | Bengali | vLLM | Qwen-based translation model |
🛠️ Advanced Configuration
Custom Sampling Parameters for vLLM Models
from vllm import SamplingParams
# Create custom sampling parameters
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=512
)
# Initialize translator with custom parameters
translator = TranslatorModel.from_pretrained(
"bangla_qwen_en2bn",
sampling_params=sampling_params
)
💪 Contributors
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
📚 Citation
If you use Faster Translate in your research, please cite:
@software{faster_translate,
author = {Sawradip Saha and Contributors},
title = {Faster Translate: High-Performance Machine Translation Library},
url = {https://github.com/sawradip/faster-translate},
year = {2024},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file faster_translate-1.0.2.tar.gz
.
File metadata
- Download URL: faster_translate-1.0.2.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
d8810cbe18a4ab8d851f80c7991000ce5025891880a5bcf6bc87ee4016b58fc3
|
|
MD5 |
9afdbe581c0f3bd80a1534d7cd11f96b
|
|
BLAKE2b-256 |
73539f3deb31c6b75e281dfd78d03c96eac2d8fe0b4eb9c00ff01ddca04f4ba3
|
File details
Details for the file faster_translate-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: faster_translate-1.0.2-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
13e25dbef4263a47eb5e7c4c6063efce69530158dffc24d31baaad05cb4095df
|
|
MD5 |
9400eae67e59b441853a07d9103d42cf
|
|
BLAKE2b-256 |
6f3d5d4ba2ebc57c0e6d740c6afcad3772b17038517bbb5965f32838efa85f3c
|