Local CLI for TranslateGemma translation (multi-platform, multi-model)

These details have not been verified by PyPI

Project links

Homepage

Project description

TranslateGemma CLI

🚀 Production-ready local translation powered by Google's TranslateGemma
Supporting 55 languages with smart chunking, streaming output, and batch processing

✨ Highlights

🌍 55 Languages - Full TranslateGemma language support
📚 Unlimited Length - Smart chunking with sliding window for texts of any length
⚡ Streaming Output - Real-time translation progress
📦 Batch Processing - Translate entire directories at once
🎯 Multiple Backends - Local (MLX/PyTorch), vLLM, or Ollama
💻 Multi-platform - macOS (Apple Silicon), Linux, Windows
🔧 Highly Configurable - Flexible parameters for different use cases

🎬 Quick Start

Installation

# Using uv (recommended)
uv venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
uv pip install -e ".[mlx]"  # macOS Apple Silicon
# or
uv pip install -e ".[cuda]"  # Linux/Windows with NVIDIA GPU

# Using pip
pip install -e ".[mlx]"  # macOS Apple Silicon
pip install -e ".[cuda]"  # Linux/Windows with NVIDIA GPU
pip install -e ".[cpu]"  # CPU-only

First Run

# Initialize configuration
translate init

# Download model (first time only)
translate model download 27b

# Start translating!
translate --text "Hello world"
# Output: 你好，世界。

🚀 Features

1. Smart Long Text Translation

Problem: TranslateGemma truncates long texts (>500 chars)

Solution: Smart chunking with sliding window

# Automatic chunking for long text
translate --file long_article.txt

# Custom chunk parameters
translate --file book.txt --chunk-size 80 --overlap 10

# Disable chunking for short text
translate --file short.txt --no-chunk

How it works:

Original: [AAAAA][BBBBB][CCCCC][DDDDD]

Sliding Window:
Chunk 1: [AAAAA]
Chunk 2:    [AA|BBBBB]    ← Overlap provides context
Chunk 3:         [BB|CCCCC]
Chunk 4:              [CC|DDDDD]

Result: Complete translation with context preservation

2. Streaming Output

Real-time translation progress for better UX:

# Stream output token by token
translate --file article.txt --stream

# Combine with chunking
translate --file book.txt --chunk-size 80 --stream

3. Batch Translation

Translate entire directories efficiently:

# Translate all .txt and .md files
translate --dir ./documents

# Output to ./documents/translated/

4. Interactive REPL

translate

TranslateGemma Interactive (yue ↔ en)
Model: 27b | Mode: direct | Type /help for commands

> 今日天氣好好
[yue→en] The weather is really nice today

> /to ja
Target language set to: ja

> Hello
[en→ja] こんにちは。

> /quit
再見！Goodbye!

📖 Usage

Basic Translation

# Single text
translate --text "Hello world"

# From file
translate --file input.txt --output output.txt

# From stdin
echo "Bonjour" | translate

# Force target language
translate --text "Hello" --to ja

Long Text Translation

# Auto-chunking (text > 300 chars)
translate --file article.txt

# Custom chunking
translate --file book.txt --chunk-size 80 --overlap 10

# Streaming for real-time feedback
translate --file long.txt --stream

# Disable chunking
translate --file short.txt --no-chunk

Batch Processing

# Translate directory
translate --dir ./documents

# With custom parameters
translate --dir ./docs --chunk-size 100

Model Management

# List models
translate model list

# Download model
translate model download 4b

# Check status
translate model status

# List supported languages
translate model langs

⚙️ Configuration

Config file: ~/.config/translate/config.yaml

Default Configuration (Optimized)

model:
  name: 27b              # Model size: 4b, 12b, 27b
  quantization: 4        # 4-bit or 8-bit

backend:
  type: auto             # auto, mlx, pytorch, vllm, ollama
  vllm_url: http://localhost:8000
  ollama_url: http://localhost:11434

translation:
  languages: [yue, en]   # Language pair
  mode: direct           # direct or explain
  max_tokens: 512        # Base max tokens (auto-adjusted for chunks)
  
  chunking:
    enabled: true        # Enable smart chunking
    chunk_size: 80       # Optimal for completeness
    overlap: 10          # Minimal repetition
    split_by: sentence   # sentence, paragraph, or char
    auto_threshold: 300  # Auto-enable for text > 300 chars

ui:
  show_detected_language: true
  colored_output: true
  show_progress: true

Customization

# Initialize with defaults
translate init

# Force overwrite
translate init --force

# Edit manually
vim ~/.config/translate/config.yaml

🎯 Best Practices

Chunk Size Selection

Text Type	chunk_size	overlap	Reason
Daily conversation	60-80	10-15	Short sentences
Technical docs	80-100	15-20	Term consistency
Literary works	80-100	20-30	Context preservation
Long articles	80-100	10-20	Balance quality & speed

When to Use Chunking

Text Length	Recommendation
< 300 chars	Use `--no-chunk` for speed
300-1000 chars	Auto-chunking (default)
1000-5000 chars	`--chunk-size 80 --overlap 10`
5000+ chars (books)	`--chunk-size 80 --stream`

Performance Tips

Interactive mode - Model loads once, faster for multiple translations
Batch processing - Use --dir instead of translating files one by one
Streaming - Use --stream for long texts to see progress
Optimal chunks - chunk_size=80, overlap=10 is the sweet spot

📊 Performance

Test Environment: MacBook Pro M2 Max, 96GB, MLX Backend

Text Length	Chunks	Time	Throughput
100 chars	1	1.2s	83 chars/s
400 chars	4	8.5s	48 chars/s
1000 chars	12	~22s	~45 chars/s
5000 chars	60	~110s	~45 chars/s

Memory Usage: 14.15 GB (stable across all text lengths)

🛠️ Requirements

macOS (Apple Silicon)

M1/M2/M3/M4 Mac
8GB+ unified memory (4b), 16GB+ (12b), 32GB+ (27b)
macOS 14.0+

Linux / Windows

NVIDIA GPU with 8GB+ VRAM (or CPU with 16GB+ RAM)
CUDA 11.8+ (for GPU)

All Platforms

Python 3.11+

📦 Installation Options

Option 1: uv (Fastest, Recommended)

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install
git clone https://github.com/jhkchan/translategemma-cli.git
cd translategemma-cli
uv venv .venv
source .venv/bin/activate

# macOS Apple Silicon
uv pip install -e ".[mlx]"

# Linux/Windows with NVIDIA GPU
uv pip install -e ".[cuda]"

# CPU-only
uv pip install -e ".[cpu]"

Option 2: pipx (Isolated Installation)

# Install from local directory
pipx install /path/to/translategemma-cli[mlx]

# Or from git (when published)
pipx install git+https://github.com/jhkchan/translategemma-cli.git[mlx]

Option 3: pip (Traditional)

git clone https://github.com/jhkchan/translategemma-cli.git
cd translategemma-cli
python3 -m venv venv
source venv/bin/activate
pip install -e ".[mlx]"  # or [cuda] or [cpu]

🌍 Supported Languages (55)

Code	Language	Code	Language
`en`	English	`yue`	Cantonese
`zh`	Chinese (Simplified)	`zh-TW`	Chinese (Traditional)
`ja`	Japanese	`ko`	Korean
`fr`	French	`de`	German
`es`	Spanish	`pt`	Portuguese
`ru`	Russian	`ar`	Arabic

...and 45 more. Run translate model langs for full list.

🎓 Advanced Usage

Custom Language Pairs

Edit ~/.config/translate/config.yaml:

translation:
  languages: [ja, en]  # Japanese ↔ English
  # or
  languages: [zh, fr]  # Chinese ↔ French

Backend Options

# Local (default)
translate --backend mlx  # macOS
translate --backend pytorch  # Linux/Windows

# vLLM (high throughput)
vllm serve google/translategemma-27b-it --quantization awq
translate --backend vllm --server http://localhost:8000

# Ollama (easy setup)
ollama pull translategemma:27b
translate --backend ollama

Interactive Commands

Command	Function
`/to <lang>`	Force target language
`/auto`	Enable auto-detection
`/mode direct`	Direct translation
`/mode explain`	With explanations
`/model <size>`	Switch model
`/backend <type>`	Switch backend
`/langs`	List languages
`/config`	Show configuration
`/quit`	Exit

🔬 Technical Details

Smart Chunking Algorithm

# Sentence-based splitting with sliding window
TextChunker(
    chunk_size=80,      # Target chunk size
    overlap=10,         # Overlap for context
    split_by="sentence" # Split at sentence boundaries
)

# Process:
1. Split text at sentence boundaries
2. Group sentences into chunks (~80 chars)
3. Add overlap from previous chunk
4. Translate each chunk with context
5. Merge results (skip overlap)

Adaptive max_tokens

# Dynamically adjust based on input length
adaptive_max_tokens = min(
    2048,                      # Upper limit
    max(512, len(chunk) * 3)   # 3x input (safety buffer)
)

# Why 3x?
# - Chinese → English typically expands 1.5-2x
# - 3x provides safety buffer
# - Prevents truncation

Merge Strategy

# Simple concatenation (overlap provides context only)
def merge(chunks, translations):
    result = [translations[0]]  # Keep first complete
    for trans in translations[1:]:
        result.append(" " + trans)  # Add space between chunks
    return "".join(result)

# Note: Minimal overlap (10) reduces repetition

📚 Documentation

Document	Description
README.md	Main documentation (this file)
QUICK_REFERENCE.md	Quick reference card
BEST_PRACTICES.md	Usage best practices
LONG_TEXT_FEATURE_REPORT.md	Feature detailed report
FINAL_TEST_REPORT.md	Comprehensive test report
DEVELOPMENT_SUMMARY.md	Development summary
TRANSLATION_TEST_REPORT.md	Multi-language quality assessment

🎯 Use Cases

Use Case 1: Translate a Book

# With streaming for progress feedback
translate --file novel.txt --chunk-size 80 --overlap 10 --stream --output novel_en.txt

Use Case 2: Batch Translate Documentation

# Translate all docs in directory
translate --dir ./docs

# Output to ./docs/translated/

Use Case 3: Quick Translation

# Short text, no chunking
translate --text "Hello world" --no-chunk

# Or use interactive mode
translate
> Hello world
[en→yue] 你好，世界。

Use Case 4: Multi-language Workflow

# English to multiple languages
translate --text "Welcome" --to ja  # Japanese
translate --text "Welcome" --to ko  # Korean
translate --text "Welcome" --to zh  # Chinese
translate --text "Welcome" --to fr  # French

🔧 Development Insights

Key Learnings

TranslateGemma Model Characteristics:
- Truncates long texts (>500 chars)
- Stops at paragraph breaks (empty lines)
- Requires small chunks (80-100 chars) for completeness
Optimal Chunking Strategy:
- chunk_size=80: Best completeness (98%)
- overlap=10: Minimal repetition (<5%)
- split_by=sentence: Natural boundaries
Adaptive max_tokens:
- Fixed 512 tokens insufficient for long chunks
- 3x input length ensures completeness
- Cap at 2048 to prevent over-generation
Merge Strategy:
- Simple concatenation works best
- Overlap provides context, not for deduplication
- Smart deduplication is complex (future work)

Architecture

User Input
    ↓
TextChunker (chunker.py)
    ↓
[Chunk 1] [Chunk 2] [Chunk 3] ...
    ↓         ↓         ↓
Translator.translate_long()
    ↓
Adaptive max_tokens (3x input)
    ↓
MLX/PyTorch/vLLM/Ollama Backend
    ↓
Merge Results
    ↓
Output (Complete Translation)

🧪 Testing

Run Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run with coverage
pytest --cov=translategemma_cli

# Run specific test
pytest tests/test_chunker.py

Manual Testing

# Comprehensive test suite
./tests/comprehensive_test.sh

# Or test individual features
translate --file test.txt --chunk-size 80
translate --dir ./test_docs
translate --text "Test" --stream

📊 Benchmarks

Translation Completeness

Method	Completeness	Speed	Recommendation
No chunking	13%	Fast	❌ Long text fails
chunk=150	70%	Medium	⚠️ Not recommended
chunk=100	95%	Medium	✅ Good
chunk=80	98%	Medium	✅ Best
chunk=60	100%	Slow	⚠️ Over-chunking

Overlap Impact

Overlap	Repetition	Quality	Recommendation
0	0%	Medium	⚠️ No context
10	<5%	High	✅ Best
20	5-10%	High	✅ Good
30	10-15%	Medium	⚠️ Too much
50	20-30%	Low	❌ Not recommended

🎨 Model Selection

Model	Parameters	Disk Size	Memory	Use Case
4b	5B	~3.2 GB	8GB+	Fast translation, limited resources
12b	13B	~7.0 GB	16GB+	Balance performance & quality
27b	29B	~14.8 GB	32GB+	Best quality (recommended)

🌟 What's New in v0.2.0

Major Features

✅ Smart Text Chunking - Handle unlimited text length
✅ Sliding Window - Preserve context with overlap
✅ Streaming Output - Real-time translation progress
✅ Batch Translation - Process entire directories
✅ Adaptive max_tokens - Prevent truncation
✅ Progress Display - Visual feedback with rich

New CLI Parameters

--chunk-size <int>    # Chunk size (default: 80)
--overlap <int>       # Overlap size (default: 10)
--no-chunk            # Disable chunking
--stream              # Enable streaming
--dir <path>          # Batch translate directory

Performance Improvements

Translation completeness: 13% → 98% (for long texts)
Throughput: Stable 45-50 chars/sec
Memory: Unchanged (14.15 GB)

🐛 Known Limitations

1. Model Behavior

Paragraph breaks: Model stops at empty lines
- Solution: Use small chunks (80 chars)
Long chunks: Truncates if chunk > 150 chars
- Solution: Adaptive max_tokens (3x input)

2. Overlap Repetition

Issue: overlap > 10 causes slight repetition
Reason: Overlapped region translated twice
Recommendation: Use overlap=10-20

3. Not Yet Implemented

Smart deduplication (planned for v0.3.0)
Translation cache (planned for v0.3.0)
Resume capability (planned for v0.4.0)
Terminology support (under evaluation)

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open Pull Request

📄 License

This project is licensed under the MIT License - see LICENSE file.

Note: TranslateGemma models are subject to Google's model license terms. Please review and comply with the model license.

🙏 Acknowledgments

Google TranslateGemma - Base translation model
MLX - Apple Silicon optimization
Cursor + Claude - Development tools
hy-mt - Inspiration for chunking strategy

🔗 Links

GitHub: https://github.com/jhkchan/translategemma-cli
HuggingFace: https://huggingface.co/collections/google/translategemma
Issues: https://github.com/jhkchan/translategemma-cli/issues
Documentation: See docs directory

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [Your Email]

🗺️ Roadmap

v0.3.0 (Next)

Smart deduplication algorithm
Translation cache system
Improved language detection
Terminology support

v0.4.0 (Future)

Resume capability
Parallel translation (multi-GPU)
Web UI
REST API server

Version: 0.2.0
Last Updated: 2026-01-17
Status: Production Ready ✅

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.0

Jan 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

translategemma_cli-0.2.0.tar.gz (53.5 kB view details)

Uploaded Jan 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

translategemma_cli-0.2.0-py3-none-any.whl (40.4 kB view details)

Uploaded Jan 17, 2026 Python 3

File details

Details for the file translategemma_cli-0.2.0.tar.gz.

File metadata

Download URL: translategemma_cli-0.2.0.tar.gz
Upload date: Jan 17, 2026
Size: 53.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for translategemma_cli-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a24f85218062f0dcbf8311092aea2f1ac66755595ce838179d3bdb31c91a32bb`
MD5	`4fa94bf0aedba7e70f34047934031302`
BLAKE2b-256	`7c0682621f0c835df118ad8dd690dc6fdef786893242e3114734b9ef4129303b`

See more details on using hashes here.

File details

Details for the file translategemma_cli-0.2.0-py3-none-any.whl.

File metadata

Download URL: translategemma_cli-0.2.0-py3-none-any.whl
Upload date: Jan 17, 2026
Size: 40.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for translategemma_cli-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`73d9c7b1a3f0657c7d39fe314e3f4c37d0b09b0520cf01bacf9ce3c2e31db9fd`
MD5	`85776b0106d36868f73b0db75f894916`
BLAKE2b-256	`08cbbd1a54e8e234296e73af002974de5d5d8e4f34b796eb94aa7446da9204bb`

See more details on using hashes here.

translategemma-cli 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TranslateGemma CLI

✨ Highlights

🎬 Quick Start

Installation

First Run

🚀 Features

1. Smart Long Text Translation

2. Streaming Output

3. Batch Translation

4. Interactive REPL

📖 Usage

Basic Translation

Long Text Translation

Batch Processing

Model Management

⚙️ Configuration

Default Configuration (Optimized)

Customization

🎯 Best Practices

Chunk Size Selection

When to Use Chunking

Performance Tips

📊 Performance

🛠️ Requirements

macOS (Apple Silicon)

Linux / Windows

All Platforms

📦 Installation Options

Option 1: uv (Fastest, Recommended)

Option 2: pipx (Isolated Installation)

Option 3: pip (Traditional)

🌍 Supported Languages (55)

🎓 Advanced Usage

Custom Language Pairs

Backend Options

Interactive Commands

🔬 Technical Details

Smart Chunking Algorithm

Adaptive max_tokens

Merge Strategy

📚 Documentation

🎯 Use Cases

Use Case 1: Translate a Book

Use Case 2: Batch Translate Documentation

Use Case 3: Quick Translation

Use Case 4: Multi-language Workflow

🔧 Development Insights

Key Learnings

Architecture

🧪 Testing

Run Tests

Manual Testing

📊 Benchmarks

Translation Completeness

Overlap Impact

🎨 Model Selection

🌟 What's New in v0.2.0

Major Features

New CLI Parameters

Performance Improvements

🐛 Known Limitations

1. Model Behavior

2. Overlap Repetition

3. Not Yet Implemented

🤝 Contributing

📄 License

🙏 Acknowledgments

🔗 Links

📞 Support

🗺️ Roadmap

v0.3.0 (Next)