Skip to main content

A high-performance translation library using CTTranslate2 and vLLM.

Project description

Faster Translate

PyPI Downloads Monthly Downloads GitHub License PyPI Version

A high-performance translation library powered by state-of-the-art models. Faster Translate offers optimized inference using CTranslate2 and vLLM backends, providing an easy-to-use interface for applications requiring efficient and accurate translations.

🚀 Features

  • High-performance inference using CTranslate2 and vLLM backends
  • Seamless integration with Hugging Face models
  • Flexible API for single sentence, batch, and large-scale translation
  • Dataset translation with direct Hugging Face integration
  • Multi-backend support for both traditional (CTranslate2) and LLM-based (vLLM) models
  • Text normalization for improved translation quality

📦 Installation

pip install faster-translate

Optional Dependencies

For specific normalizers or model backends:

# For Bengali text normalization
pip install git+https://github.com/csebuetnlp/normalizer

# For vLLM backend support (required for LLM-based models)
pip install vllm

🔍 Usage

Basic Translation

from faster_translate import TranslatorModel

# Initialize with a pre-configured model
translator = TranslatorModel.from_pretrained("banglanmt_bn2en")

# Translate a single sentence
english_text = translator.translate_single("দেশে বিদেশি ঋণ নিয়ে এখন বেশ আলোচনা হচ্ছে।")
print(english_text)

# Translate a batch of sentences
bengali_sentences = [
    "দেশে বিদেশি ঋণ নিয়ে এখন বেশ আলোচনা হচ্ছে।",
    "রাত তিনটার দিকে কাঁচামাল নিয়ে গুলিস্তান থেকে পুরান ঢাকার শ্যামবাজারের আড়তে যাচ্ছিলেন লিটন ব্যাপারী।"
]
translations = translator.translate_batch(bengali_sentences)

Using Different Model Backends

# Using a CTTranslate2-based model
ct2_translator = TranslatorModel.from_pretrained("banglanmt_bn2en")

# Using a vLLM-based model
vllm_translator = TranslatorModel.from_pretrained("bangla_qwen_en2bn")

Loading Models from Hugging Face

# Load a specific model from Hugging Face
translator = TranslatorModel.from_pretrained(
    "sawradip/faster-translate-banglanmt-bn2en-t5",
    normalizer_func="buetnlpnormalizer"
)

Translating Hugging Face Datasets

Translate an entire dataset with a single function call:

translator = TranslatorModel.from_pretrained("banglanmt_en2bn")

# Translate the entire dataset
translator.translate_hf_dataset(
    "sawradip/bn-translation-mega-raw-noisy", 
    batch_size=16
)

# Translate specific subsets
translator.translate_hf_dataset(
    "sawradip/bn-translation-mega-raw-noisy",
    subset_name=["google"], 
    batch_size=16
)

# Translate a portion of the dataset
translator.translate_hf_dataset(
    "sawradip/bn-translation-mega-raw-noisy",
    subset_name="alt",
    batch_size=16, 
    translation_size=0.5  # Translate 50% of the dataset
)

Publishing Translated Datasets

Push translated datasets directly to Hugging Face:

translator.translate_hf_dataset(
    "sawradip/bn-translation-mega-raw-noisy",
    subset_name="alt",
    batch_size=16, 
    push_to_hub=True,
    token="your_huggingface_token",
    save_repo_name="your-username/translated-dataset"
)

🌐 Supported Models

Model ID Source Language Target Language Backend Description
banglanmt_bn2en Bengali English CTranslate2 BanglaNMT model from BUET
banglanmt_en2bn English Bengali CTranslate2 BanglaNMT model from BUET
bangla_mbartv1_en2bn English Bengali CTranslate2 MBart-based translation model
bangla_qwen_en2bn English Bengali vLLM Qwen-based translation model

🛠️ Advanced Configuration

Custom Sampling Parameters for vLLM Models

from vllm import SamplingParams

# Create custom sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Initialize translator with custom parameters
translator = TranslatorModel.from_pretrained(
    "bangla_qwen_en2bn", 
    sampling_params=sampling_params
)

💪 Contributors

List of Contributors

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use Faster Translate in your research, please cite:

@software{faster_translate,
  author = {Sawradip Saha and Contributors},
  title = {Faster Translate: High-Performance Machine Translation Library},
  url = {https://github.com/sawradip/faster-translate},
  year = {2024},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faster_translate-1.0.2.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

faster_translate-1.0.2-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file faster_translate-1.0.2.tar.gz.

File metadata

  • Download URL: faster_translate-1.0.2.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for faster_translate-1.0.2.tar.gz
Algorithm Hash digest
SHA256 d8810cbe18a4ab8d851f80c7991000ce5025891880a5bcf6bc87ee4016b58fc3
MD5 9afdbe581c0f3bd80a1534d7cd11f96b
BLAKE2b-256 73539f3deb31c6b75e281dfd78d03c96eac2d8fe0b4eb9c00ff01ddca04f4ba3

See more details on using hashes here.

File details

Details for the file faster_translate-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for faster_translate-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 13e25dbef4263a47eb5e7c4c6063efce69530158dffc24d31baaad05cb4095df
MD5 9400eae67e59b441853a07d9103d42cf
BLAKE2b-256 6f3d5d4ba2ebc57c0e6d740c6afcad3772b17038517bbb5965f32838efa85f3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page