Translate DOCX files while preserving formatting.
Project description
WordFlux ๐
Translate DOCX documents with perfect formatting
WordFlux is a powerful and intelligent tool for translating Microsoft Word documents (.docx) from one language to another while preserving the original formatting, structure, and layout completely.
โจ Key Features
๐ง Comprehensive Translation
- โ Regular text paragraphs - Translate while preserving formatting (bold, italic, underline, superscript, subscript)
- โ Tables - Translate content in table cells
- โ Charts - Translate titles and data labels
- โ SmartArt - Translate text in SmartArt diagrams
- โ Complex formatting - Preserve all text formatting, colors, fonts
โก High Performance
- ๐ Parallel processing - Use async/await to translate multiple segments simultaneously
- ๐ฆ Smart chunking - Automatically split content to optimize API calls
- ๐ฏ Concurrent requests - Support up to 100 concurrent requests (configurable)
- ๐พ Checkpoint system - Save progress to resume if interrupted
๐ก๏ธ Reliable
- ๐ Retry mechanism - Automatically retry on errors
- ๐ Progress tracking - Track progress with progress bars
- ๐จ Format preservation - Maintain original formatting completely
- ๐ Error handling - Handle errors intelligently and user-friendly
๐ Installation
System Requirements
- Python 3.12+
- OpenAI API key
Install from source
# Clone repository
git clone https://github.com/pnnbao97/wordflux.git
cd wordflux
# Install dependencies
pip install -e .
Manual dependency installation
pip install openai>=2.3.0 python-docx>=1.2.0 pyyaml>=6.0.3 tqdm>=4.67.1
โ๏ธ Configuration
Create a config.yaml file in the root directory:
# OpenAI Configuration
openai_api_key: "sk-your-openai-api-key-here" # Replace with your API key
model: "gpt-4o-mini" # Can use gpt-4, gpt-3.5-turbo, etc.
# Translation Settings
source_lang: "English"
target_lang: "Vietnamese"
# Performance Settings
max_concurrent: 100 # Maximum concurrent requests
max_chunk_size: 5000 # Maximum chunk size (characters)
Supported OpenAI Models
gpt-4o-mini(default, cost-effective)gpt-4ogpt-4gpt-3.5-turbo- And other OpenAI models
๐ Usage
1. Command Line Usage
# Translate DOCX file
python main.py input_file.docx
# Specify output directory
python main.py input_file.docx --output_dir ./my_output
# Concrete example
python main.py document.docx --output_dir ./translated_docs
2. Use as Python Module
from wordflux import DocxTranslator
# Initialize translator
translator = DocxTranslator(
input_file="document.docx",
output_dir="output",
openai_api_key="your-api-key",
model="gpt-4o-mini",
source_lang="English",
target_lang="Vietnamese",
max_chunk_size=5000,
max_concurrent=100
)
# Perform translation
translator.translate()
# Get translated file path
output_path = translator.get_output_path()
print(f"Translated file: {output_path}")
3. Step-by-step Usage
from wordflux import DocxTranslator
translator = DocxTranslator("document.docx", "output", "your-api-key")
# Step 1: Extract content
translator.extract()
# Step 2: Translate content
translator.translator.translate()
# Step 3: Inject translations into file
translator.inject()
๐ง Advanced Configuration
Performance Tuning
# config.yaml
max_concurrent: 50 # Reduce if encountering rate limit errors
max_chunk_size: 3000 # Reduce for complex documents
Change Languages
source_lang: "English"
target_lang: "French" # Or "Spanish", "German", "Chinese", etc.
Use Different Models
model: "gpt-4o" # More powerful model, more expensive
# or
model: "gpt-3.5-turbo" # Faster model, cheaper
๐ Project Structure
wordflux/
โโโ ๐ main.py # Entry point
โโโ โ๏ธ config.yaml # Configuration
โโโ ๐ pyproject.toml # Project metadata
โโโ ๐ README.md # This documentation
โโโ ๐๏ธ output/ # Output directory for translated files
โ โโโ document_translated.docx
โ โโโ document_checkpoint.json
โโโ ๐ฆ wordflux/ # Main package
โโโ ๐ __init__.py
โโโ ๐ง docxtranslator.py # Main class
โโโ ๐ document/ # Data models
โ โโโ document.py
โโโ ๐จ worker/ # Core workers
โ โโโ extractor.py # Extract content
โ โโโ translator.py # Translate content
โ โโโ injector.py # Inject translations
โโโ ๐ ๏ธ utils/ # Utilities
โโโ decorator.py # Decorators (timer, retry, etc.)
โโโ is_numeric.py # Helper functions
โโโ openai_client.py # OpenAI client manager
โโโ prompt_builder.py # Build prompts
โโโ spinner.py # Loading spinner
๐ฏ Usage Examples
Simple Document Translation
# Translate document.docx from English to Vietnamese
python main.py document.docx
๐จ Error Handling
API Key Error
โ Translation failed: OpenAI API key not found in config
Solution: Check config.yaml file and ensure API key is correct.
Rate Limit Error
โ Translation failed: Rate limit exceeded
Solution: Reduce max_concurrent in config.yaml from 100 to 50 or 25.
File Not Found Error
โ Translation failed: [Errno 2] No such file or directory: 'document.docx'
Solution: Check input file path.
๐ก Tips and Tricks
1. Cost Optimization
- Use
gpt-4o-miniinstead ofgpt-4ofor simple documents - Adjust
max_chunk_sizeaccording to content
2. Speed Optimization
- Increase
max_concurrentif you have high API quota - Use SSD for temporary file storage
3. Large Document Handling
- Split large documents into smaller files
- Use checkpoint system to resume if interrupted
4. Quality Control
- Always review translations before use
- Adjust prompts if necessary
๐ค Contributing
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Create a Pull Request
๐ License
This project is distributed under the MIT License. See the LICENSE file for more information.
๐จโ๐ป Author
Pham Nguyen Ngoc Bao
- ๐ง Email: pnnbao@gmail.com
- ๐ GitHub: @pnnbao97
- ๐ Facebook: pnnbao
๐ Acknowledgments
- OpenAI API for powerful translation capabilities
- python-docx library for DOCX file processing
- Python community for supporting libraries
๐ Support
If you encounter issues or have questions:
- ๐ Read this documentation carefully
- ๐ Check Issues
- ๐ Create a new issue if no solution exists
- ๐ง Contact directly: pnnbao@gmail.com
WordFlux - Smart document translation with perfect formatting preservation! ๐โจ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wordflux-0.1.0.tar.gz.
File metadata
- Download URL: wordflux-0.1.0.tar.gz
- Upload date:
- Size: 42.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62adcc9db0a38838c0517a3c4c0768d9f01fc0a42a7504e1d479f6ec2a55d0b1
|
|
| MD5 |
a27908c55cf13804669892e1d3134499
|
|
| BLAKE2b-256 |
5193edcbcf31a0e1ed9a51292e3671c643f0c94e0c87342e3bf4c08022f7b659
|
File details
Details for the file wordflux-0.1.0-py3-none-any.whl.
File metadata
- Download URL: wordflux-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53a0a566ae2b6da61ab8682bf763dc6d20c0b373fa002a56980096f841b88045
|
|
| MD5 |
d3325d281e83b09873d1d32d0918214f
|
|
| BLAKE2b-256 |
439ac1844ac67c4fc87e51217f7dee426c76b84845f60137c32e210aa57aea5a
|