Skip to main content

Translate DOCX files while preserving formatting.

Project description

WordFlux ๐ŸŒ€

Translate DOCX documents with perfect formatting

WordFlux is a powerful and intelligent tool for translating Microsoft Word documents (.docx) from one language to another while preserving the original formatting, structure, and layout completely.

โœจ Key Features

๐Ÿ”ง Comprehensive Translation

  • โœ… Regular text paragraphs - Translate while preserving formatting (bold, italic, underline, superscript, subscript)
  • โœ… Tables - Translate content in table cells
  • โœ… Charts - Translate titles and data labels
  • โœ… SmartArt - Translate text in SmartArt diagrams
  • โœ… Complex formatting - Preserve all text formatting, colors, fonts

โšก High Performance

  • ๐Ÿš€ Parallel processing - Use async/await to translate multiple segments simultaneously
  • ๐Ÿ“ฆ Smart chunking - Automatically split content to optimize API calls
  • ๐ŸŽฏ Concurrent requests - Support up to 100 concurrent requests (configurable)
  • ๐Ÿ’พ Checkpoint system - Save progress to resume if interrupted

๐Ÿ›ก๏ธ Reliable

  • ๐Ÿ”„ Retry mechanism - Automatically retry on errors
  • ๐Ÿ“Š Progress tracking - Track progress with progress bars
  • ๐ŸŽจ Format preservation - Maintain original formatting completely
  • ๐Ÿ” Error handling - Handle errors intelligently and user-friendly

๐Ÿš€ Installation

System Requirements

  • Python 3.12+
  • OpenAI API key

Install from source

# Clone repository
git clone https://github.com/pnnbao97/wordflux.git
cd wordflux

# Install dependencies
pip install -e .

Manual dependency installation

pip install openai>=2.3.0 python-docx>=1.2.0 pyyaml>=6.0.3 tqdm>=4.67.1

โš™๏ธ Configuration

Create a config.yaml file in the root directory:

# OpenAI Configuration
openai_api_key: "sk-your-openai-api-key-here"  # Replace with your API key
model: "gpt-4o-mini"  # Can use gpt-4, gpt-3.5-turbo, etc.

# Translation Settings
source_lang: "English"
target_lang: "Vietnamese"

# Performance Settings
max_concurrent: 100      # Maximum concurrent requests
max_chunk_size: 5000     # Maximum chunk size (characters)

Supported OpenAI Models

  • gpt-4o-mini (default, cost-effective)
  • gpt-4o
  • gpt-4
  • gpt-3.5-turbo
  • And other OpenAI models

๐Ÿ“– Usage

1. Command Line Usage

# Translate DOCX file
python main.py input_file.docx

# Specify output directory
python main.py input_file.docx --output_dir ./my_output

# Concrete example
python main.py document.docx --output_dir ./translated_docs

2. Use as Python Module

from wordflux import DocxTranslator

# Initialize translator
translator = DocxTranslator(
    input_file="document.docx",
    output_dir="output",
    openai_api_key="your-api-key",
    model="gpt-4o-mini",
    source_lang="English",
    target_lang="Vietnamese",
    max_chunk_size=5000,
    max_concurrent=100
)

# Perform translation
translator.translate()

# Get translated file path
output_path = translator.get_output_path()
print(f"Translated file: {output_path}")

3. Step-by-step Usage

from wordflux import DocxTranslator

translator = DocxTranslator("document.docx", "output", "your-api-key")

# Step 1: Extract content
translator.extract()

# Step 2: Translate content
translator.translator.translate()

# Step 3: Inject translations into file
translator.inject()

๐Ÿ”ง Advanced Configuration

Performance Tuning

# config.yaml
max_concurrent: 50      # Reduce if encountering rate limit errors
max_chunk_size: 3000    # Reduce for complex documents

Change Languages

source_lang: "English"
target_lang: "French"   # Or "Spanish", "German", "Chinese", etc.

Use Different Models

model: "gpt-4o"         # More powerful model, more expensive
# or
model: "gpt-3.5-turbo"  # Faster model, cheaper

๐Ÿ“ Project Structure

wordflux/
โ”œโ”€โ”€ ๐Ÿ“„ main.py                 # Entry point
โ”œโ”€โ”€ โš™๏ธ config.yaml            # Configuration
โ”œโ”€โ”€ ๐Ÿ“‹ pyproject.toml         # Project metadata
โ”œโ”€โ”€ ๐Ÿ“– README.md              # This documentation
โ”œโ”€โ”€ ๐Ÿ—‚๏ธ output/               # Output directory for translated files
โ”‚   โ”œโ”€โ”€ document_translated.docx
โ”‚   โ””โ”€โ”€ document_checkpoint.json
โ””โ”€โ”€ ๐Ÿ“ฆ wordflux/              # Main package
    โ”œโ”€โ”€ ๐Ÿ“„ __init__.py
    โ”œโ”€โ”€ ๐Ÿ”ง docxtranslator.py  # Main class
    โ”œโ”€โ”€ ๐Ÿ“„ document/          # Data models
    โ”‚   โ””โ”€โ”€ document.py
    โ”œโ”€โ”€ ๐Ÿ”จ worker/            # Core workers
    โ”‚   โ”œโ”€โ”€ extractor.py      # Extract content
    โ”‚   โ”œโ”€โ”€ translator.py     # Translate content
    โ”‚   โ””โ”€โ”€ injector.py       # Inject translations
    โ””โ”€โ”€ ๐Ÿ› ๏ธ utils/             # Utilities
        โ”œโ”€โ”€ decorator.py      # Decorators (timer, retry, etc.)
        โ”œโ”€โ”€ is_numeric.py     # Helper functions
        โ”œโ”€โ”€ openai_client.py  # OpenAI client manager
        โ”œโ”€โ”€ prompt_builder.py # Build prompts
        โ””โ”€โ”€ spinner.py        # Loading spinner

๐ŸŽฏ Usage Examples

Simple Document Translation

# Translate document.docx from English to Vietnamese
python main.py document.docx

๐Ÿšจ Error Handling

API Key Error

โŒ Translation failed: OpenAI API key not found in config

Solution: Check config.yaml file and ensure API key is correct.

Rate Limit Error

โŒ Translation failed: Rate limit exceeded

Solution: Reduce max_concurrent in config.yaml from 100 to 50 or 25.

File Not Found Error

โŒ Translation failed: [Errno 2] No such file or directory: 'document.docx'

Solution: Check input file path.

๐Ÿ’ก Tips and Tricks

1. Cost Optimization

  • Use gpt-4o-mini instead of gpt-4o for simple documents
  • Adjust max_chunk_size according to content

2. Speed Optimization

  • Increase max_concurrent if you have high API quota
  • Use SSD for temporary file storage

3. Large Document Handling

  • Split large documents into smaller files
  • Use checkpoint system to resume if interrupted

4. Quality Control

  • Always review translations before use
  • Adjust prompts if necessary

๐Ÿค Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Create a Pull Request

๐Ÿ“„ License

This project is distributed under the MIT License. See the LICENSE file for more information.

๐Ÿ‘จโ€๐Ÿ’ป Author

Pham Nguyen Ngoc Bao

๐Ÿ™ Acknowledgments

  • OpenAI API for powerful translation capabilities
  • python-docx library for DOCX file processing
  • Python community for supporting libraries

๐Ÿ“ž Support

If you encounter issues or have questions:

  1. ๐Ÿ“– Read this documentation carefully
  2. ๐Ÿ” Check Issues
  3. ๐Ÿ†• Create a new issue if no solution exists
  4. ๐Ÿ“ง Contact directly: pnnbao@gmail.com

WordFlux - Smart document translation with perfect formatting preservation! ๐ŸŒ€โœจ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordflux-0.1.0.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wordflux-0.1.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file wordflux-0.1.0.tar.gz.

File metadata

  • Download URL: wordflux-0.1.0.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for wordflux-0.1.0.tar.gz
Algorithm Hash digest
SHA256 62adcc9db0a38838c0517a3c4c0768d9f01fc0a42a7504e1d479f6ec2a55d0b1
MD5 a27908c55cf13804669892e1d3134499
BLAKE2b-256 5193edcbcf31a0e1ed9a51292e3671c643f0c94e0c87342e3bf4c08022f7b659

See more details on using hashes here.

File details

Details for the file wordflux-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wordflux-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for wordflux-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 53a0a566ae2b6da61ab8682bf763dc6d20c0b373fa002a56980096f841b88045
MD5 d3325d281e83b09873d1d32d0918214f
BLAKE2b-256 439ac1844ac67c4fc87e51217f7dee426c76b84845f60137c32e210aa57aea5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page