Local CLI for TranslateGemma translation (multi-platform, multi-model)
Project description
TranslateGemma CLI
🚀 Production-ready local translation powered by Google's TranslateGemma
Supporting 55 languages with smart chunking, streaming output, and batch processing
✨ Highlights
- 🌍 55 Languages - Full TranslateGemma language support
- 📚 Unlimited Length - Smart chunking with sliding window for texts of any length
- ⚡ Streaming Output - Real-time translation progress
- 📦 Batch Processing - Translate entire directories at once
- 🎯 Multiple Backends - Local (MLX/PyTorch), vLLM, or Ollama
- 💻 Multi-platform - macOS (Apple Silicon), Linux, Windows
- 🔧 Highly Configurable - Flexible parameters for different use cases
🎬 Quick Start
Installation
# Using uv (recommended)
uv venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
uv pip install -e ".[mlx]" # macOS Apple Silicon
# or
uv pip install -e ".[cuda]" # Linux/Windows with NVIDIA GPU
# Using pip
pip install -e ".[mlx]" # macOS Apple Silicon
pip install -e ".[cuda]" # Linux/Windows with NVIDIA GPU
pip install -e ".[cpu]" # CPU-only
First Run
# Initialize configuration
translate init
# Download model (first time only)
translate model download 27b
# Start translating!
translate --text "Hello world"
# Output: 你好,世界。
🚀 Features
1. Smart Long Text Translation
Problem: TranslateGemma truncates long texts (>500 chars)
Solution: Smart chunking with sliding window
# Automatic chunking for long text
translate --file long_article.txt
# Custom chunk parameters
translate --file book.txt --chunk-size 80 --overlap 10
# Disable chunking for short text
translate --file short.txt --no-chunk
How it works:
Original: [AAAAA][BBBBB][CCCCC][DDDDD]
Sliding Window:
Chunk 1: [AAAAA]
Chunk 2: [AA|BBBBB] ← Overlap provides context
Chunk 3: [BB|CCCCC]
Chunk 4: [CC|DDDDD]
Result: Complete translation with context preservation
2. Streaming Output
Real-time translation progress for better UX:
# Stream output token by token
translate --file article.txt --stream
# Combine with chunking
translate --file book.txt --chunk-size 80 --stream
3. Batch Translation
Translate entire directories efficiently:
# Translate all .txt and .md files
translate --dir ./documents
# Output to ./documents/translated/
4. Interactive REPL
translate
TranslateGemma Interactive (yue ↔ en)
Model: 27b | Mode: direct | Type /help for commands
> 今日天氣好好
[yue→en] The weather is really nice today
> /to ja
Target language set to: ja
> Hello
[en→ja] こんにちは。
> /quit
再見!Goodbye!
📖 Usage
Basic Translation
# Single text
translate --text "Hello world"
# From file
translate --file input.txt --output output.txt
# From stdin
echo "Bonjour" | translate
# Force target language
translate --text "Hello" --to ja
Long Text Translation
# Auto-chunking (text > 300 chars)
translate --file article.txt
# Custom chunking
translate --file book.txt --chunk-size 80 --overlap 10
# Streaming for real-time feedback
translate --file long.txt --stream
# Disable chunking
translate --file short.txt --no-chunk
Batch Processing
# Translate directory
translate --dir ./documents
# With custom parameters
translate --dir ./docs --chunk-size 100
Model Management
# List models
translate model list
# Download model
translate model download 4b
# Check status
translate model status
# List supported languages
translate model langs
⚙️ Configuration
Config file: ~/.config/translate/config.yaml
Default Configuration (Optimized)
model:
name: 27b # Model size: 4b, 12b, 27b
quantization: 4 # 4-bit or 8-bit
backend:
type: auto # auto, mlx, pytorch, vllm, ollama
vllm_url: http://localhost:8000
ollama_url: http://localhost:11434
translation:
languages: [yue, en] # Language pair
mode: direct # direct or explain
max_tokens: 512 # Base max tokens (auto-adjusted for chunks)
chunking:
enabled: true # Enable smart chunking
chunk_size: 80 # Optimal for completeness
overlap: 10 # Minimal repetition
split_by: sentence # sentence, paragraph, or char
auto_threshold: 300 # Auto-enable for text > 300 chars
ui:
show_detected_language: true
colored_output: true
show_progress: true
Customization
# Initialize with defaults
translate init
# Force overwrite
translate init --force
# Edit manually
vim ~/.config/translate/config.yaml
🎯 Best Practices
Chunk Size Selection
| Text Type | chunk_size | overlap | Reason |
|---|---|---|---|
| Daily conversation | 60-80 | 10-15 | Short sentences |
| Technical docs | 80-100 | 15-20 | Term consistency |
| Literary works | 80-100 | 20-30 | Context preservation |
| Long articles | 80-100 | 10-20 | Balance quality & speed |
When to Use Chunking
| Text Length | Recommendation |
|---|---|
| < 300 chars | Use --no-chunk for speed |
| 300-1000 chars | Auto-chunking (default) |
| 1000-5000 chars | --chunk-size 80 --overlap 10 |
| 5000+ chars (books) | --chunk-size 80 --stream |
Performance Tips
- Interactive mode - Model loads once, faster for multiple translations
- Batch processing - Use
--dirinstead of translating files one by one - Streaming - Use
--streamfor long texts to see progress - Optimal chunks - chunk_size=80, overlap=10 is the sweet spot
📊 Performance
Test Environment: MacBook Pro M2 Max, 96GB, MLX Backend
| Text Length | Chunks | Time | Throughput |
|---|---|---|---|
| 100 chars | 1 | 1.2s | 83 chars/s |
| 400 chars | 4 | 8.5s | 48 chars/s |
| 1000 chars | 12 | ~22s | ~45 chars/s |
| 5000 chars | 60 | ~110s | ~45 chars/s |
Memory Usage: 14.15 GB (stable across all text lengths)
🛠️ Requirements
macOS (Apple Silicon)
- M1/M2/M3/M4 Mac
- 8GB+ unified memory (4b), 16GB+ (12b), 32GB+ (27b)
- macOS 14.0+
Linux / Windows
- NVIDIA GPU with 8GB+ VRAM (or CPU with 16GB+ RAM)
- CUDA 11.8+ (for GPU)
All Platforms
- Python 3.11+
📦 Installation Options
Option 1: uv (Fastest, Recommended)
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install
git clone https://github.com/jhkchan/translategemma-cli.git
cd translategemma-cli
uv venv .venv
source .venv/bin/activate
# macOS Apple Silicon
uv pip install -e ".[mlx]"
# Linux/Windows with NVIDIA GPU
uv pip install -e ".[cuda]"
# CPU-only
uv pip install -e ".[cpu]"
Option 2: pipx (Isolated Installation)
# Install from local directory
pipx install /path/to/translategemma-cli[mlx]
# Or from git (when published)
pipx install git+https://github.com/jhkchan/translategemma-cli.git[mlx]
Option 3: pip (Traditional)
git clone https://github.com/jhkchan/translategemma-cli.git
cd translategemma-cli
python3 -m venv venv
source venv/bin/activate
pip install -e ".[mlx]" # or [cuda] or [cpu]
🌍 Supported Languages (55)
| Code | Language | Code | Language |
|---|---|---|---|
en |
English | yue |
Cantonese |
zh |
Chinese (Simplified) | zh-TW |
Chinese (Traditional) |
ja |
Japanese | ko |
Korean |
fr |
French | de |
German |
es |
Spanish | pt |
Portuguese |
ru |
Russian | ar |
Arabic |
...and 45 more. Run translate model langs for full list.
🎓 Advanced Usage
Custom Language Pairs
Edit ~/.config/translate/config.yaml:
translation:
languages: [ja, en] # Japanese ↔ English
# or
languages: [zh, fr] # Chinese ↔ French
Backend Options
# Local (default)
translate --backend mlx # macOS
translate --backend pytorch # Linux/Windows
# vLLM (high throughput)
vllm serve google/translategemma-27b-it --quantization awq
translate --backend vllm --server http://localhost:8000
# Ollama (easy setup)
ollama pull translategemma:27b
translate --backend ollama
Interactive Commands
| Command | Function |
|---|---|
/to <lang> |
Force target language |
/auto |
Enable auto-detection |
/mode direct |
Direct translation |
/mode explain |
With explanations |
/model <size> |
Switch model |
/backend <type> |
Switch backend |
/langs |
List languages |
/config |
Show configuration |
/quit |
Exit |
🔬 Technical Details
Smart Chunking Algorithm
# Sentence-based splitting with sliding window
TextChunker(
chunk_size=80, # Target chunk size
overlap=10, # Overlap for context
split_by="sentence" # Split at sentence boundaries
)
# Process:
1. Split text at sentence boundaries
2. Group sentences into chunks (~80 chars)
3. Add overlap from previous chunk
4. Translate each chunk with context
5. Merge results (skip overlap)
Adaptive max_tokens
# Dynamically adjust based on input length
adaptive_max_tokens = min(
2048, # Upper limit
max(512, len(chunk) * 3) # 3x input (safety buffer)
)
# Why 3x?
# - Chinese → English typically expands 1.5-2x
# - 3x provides safety buffer
# - Prevents truncation
Merge Strategy
# Simple concatenation (overlap provides context only)
def merge(chunks, translations):
result = [translations[0]] # Keep first complete
for trans in translations[1:]:
result.append(" " + trans) # Add space between chunks
return "".join(result)
# Note: Minimal overlap (10) reduces repetition
📚 Documentation
| Document | Description |
|---|---|
| README.md | Main documentation (this file) |
| QUICK_REFERENCE.md | Quick reference card |
| BEST_PRACTICES.md | Usage best practices |
| LONG_TEXT_FEATURE_REPORT.md | Feature detailed report |
| FINAL_TEST_REPORT.md | Comprehensive test report |
| DEVELOPMENT_SUMMARY.md | Development summary |
| TRANSLATION_TEST_REPORT.md | Multi-language quality assessment |
🎯 Use Cases
Use Case 1: Translate a Book
# With streaming for progress feedback
translate --file novel.txt --chunk-size 80 --overlap 10 --stream --output novel_en.txt
Use Case 2: Batch Translate Documentation
# Translate all docs in directory
translate --dir ./docs
# Output to ./docs/translated/
Use Case 3: Quick Translation
# Short text, no chunking
translate --text "Hello world" --no-chunk
# Or use interactive mode
translate
> Hello world
[en→yue] 你好,世界。
Use Case 4: Multi-language Workflow
# English to multiple languages
translate --text "Welcome" --to ja # Japanese
translate --text "Welcome" --to ko # Korean
translate --text "Welcome" --to zh # Chinese
translate --text "Welcome" --to fr # French
🔧 Development Insights
Key Learnings
-
TranslateGemma Model Characteristics:
- Truncates long texts (>500 chars)
- Stops at paragraph breaks (empty lines)
- Requires small chunks (80-100 chars) for completeness
-
Optimal Chunking Strategy:
- chunk_size=80: Best completeness (98%)
- overlap=10: Minimal repetition (<5%)
- split_by=sentence: Natural boundaries
-
Adaptive max_tokens:
- Fixed 512 tokens insufficient for long chunks
- 3x input length ensures completeness
- Cap at 2048 to prevent over-generation
-
Merge Strategy:
- Simple concatenation works best
- Overlap provides context, not for deduplication
- Smart deduplication is complex (future work)
Architecture
User Input
↓
TextChunker (chunker.py)
↓
[Chunk 1] [Chunk 2] [Chunk 3] ...
↓ ↓ ↓
Translator.translate_long()
↓
Adaptive max_tokens (3x input)
↓
MLX/PyTorch/vLLM/Ollama Backend
↓
Merge Results
↓
Output (Complete Translation)
🧪 Testing
Run Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run all tests
pytest
# Run with coverage
pytest --cov=translategemma_cli
# Run specific test
pytest tests/test_chunker.py
Manual Testing
# Comprehensive test suite
./tests/comprehensive_test.sh
# Or test individual features
translate --file test.txt --chunk-size 80
translate --dir ./test_docs
translate --text "Test" --stream
📊 Benchmarks
Translation Completeness
| Method | Completeness | Speed | Recommendation |
|---|---|---|---|
| No chunking | 13% | Fast | ❌ Long text fails |
| chunk=150 | 70% | Medium | ⚠️ Not recommended |
| chunk=100 | 95% | Medium | ✅ Good |
| chunk=80 | 98% | Medium | ✅ Best |
| chunk=60 | 100% | Slow | ⚠️ Over-chunking |
Overlap Impact
| Overlap | Repetition | Quality | Recommendation |
|---|---|---|---|
| 0 | 0% | Medium | ⚠️ No context |
| 10 | <5% | High | ✅ Best |
| 20 | 5-10% | High | ✅ Good |
| 30 | 10-15% | Medium | ⚠️ Too much |
| 50 | 20-30% | Low | ❌ Not recommended |
🎨 Model Selection
| Model | Parameters | Disk Size | Memory | Use Case |
|---|---|---|---|---|
| 4b | 5B | ~3.2 GB | 8GB+ | Fast translation, limited resources |
| 12b | 13B | ~7.0 GB | 16GB+ | Balance performance & quality |
| 27b | 29B | ~14.8 GB | 32GB+ | Best quality (recommended) |
🌟 What's New in v0.2.0
Major Features
- ✅ Smart Text Chunking - Handle unlimited text length
- ✅ Sliding Window - Preserve context with overlap
- ✅ Streaming Output - Real-time translation progress
- ✅ Batch Translation - Process entire directories
- ✅ Adaptive max_tokens - Prevent truncation
- ✅ Progress Display - Visual feedback with rich
New CLI Parameters
--chunk-size <int> # Chunk size (default: 80)
--overlap <int> # Overlap size (default: 10)
--no-chunk # Disable chunking
--stream # Enable streaming
--dir <path> # Batch translate directory
Performance Improvements
- Translation completeness: 13% → 98% (for long texts)
- Throughput: Stable 45-50 chars/sec
- Memory: Unchanged (14.15 GB)
🐛 Known Limitations
1. Model Behavior
- Paragraph breaks: Model stops at empty lines
- Solution: Use small chunks (80 chars)
- Long chunks: Truncates if chunk > 150 chars
- Solution: Adaptive max_tokens (3x input)
2. Overlap Repetition
- Issue: overlap > 10 causes slight repetition
- Reason: Overlapped region translated twice
- Recommendation: Use overlap=10-20
3. Not Yet Implemented
- Smart deduplication (planned for v0.3.0)
- Translation cache (planned for v0.3.0)
- Resume capability (planned for v0.4.0)
- Terminology support (under evaluation)
🤝 Contributing
Contributions welcome! Please:
- Fork the repository
- Create feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open Pull Request
📄 License
This project is licensed under the MIT License - see LICENSE file.
Note: TranslateGemma models are subject to Google's model license terms. Please review and comply with the model license.
🙏 Acknowledgments
- Google TranslateGemma - Base translation model
- MLX - Apple Silicon optimization
- Cursor + Claude - Development tools
- hy-mt - Inspiration for chunking strategy
🔗 Links
- GitHub: https://github.com/jhkchan/translategemma-cli
- HuggingFace: https://huggingface.co/collections/google/translategemma
- Issues: https://github.com/jhkchan/translategemma-cli/issues
- Documentation: See docs directory
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: [Your Email]
🗺️ Roadmap
v0.3.0 (Next)
- Smart deduplication algorithm
- Translation cache system
- Improved language detection
- Terminology support
v0.4.0 (Future)
- Resume capability
- Parallel translation (multi-GPU)
- Web UI
- REST API server
Version: 0.2.0
Last Updated: 2026-01-17
Status: Production Ready ✅
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file translategemma_cli-0.2.0.tar.gz.
File metadata
- Download URL: translategemma_cli-0.2.0.tar.gz
- Upload date:
- Size: 53.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a24f85218062f0dcbf8311092aea2f1ac66755595ce838179d3bdb31c91a32bb
|
|
| MD5 |
4fa94bf0aedba7e70f34047934031302
|
|
| BLAKE2b-256 |
7c0682621f0c835df118ad8dd690dc6fdef786893242e3114734b9ef4129303b
|
File details
Details for the file translategemma_cli-0.2.0-py3-none-any.whl.
File metadata
- Download URL: translategemma_cli-0.2.0-py3-none-any.whl
- Upload date:
- Size: 40.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73d9c7b1a3f0657c7d39fe314e3f4c37d0b09b0520cf01bacf9ce3c2e31db9fd
|
|
| MD5 |
85776b0106d36868f73b0db75f894916
|
|
| BLAKE2b-256 |
08cbbd1a54e8e234296e73af002974de5d5d8e4f34b796eb94aa7446da9204bb
|