A powerful Word document translation SDK with multi-model workflow.

These details have not been verified by PyPI

Project description

Word Document Translation SDK

A Python SDK for translating Word documents (.docx) using a multi-model workflow (Translate -> Evaluate -> Optimize -> Select).

Features

Multi-Model Workflow:
- Initial Translation Model: Initial Translation
- Evaluation Model: Evaluation (Accuracy, Fluency, Consistency, Terminology, Completeness)
- Optimization Model: Optimization based on evaluation
- Selection: Automatically selects the best translation based on scores.
Format Preservation: Preserves paragraph styles, tables, and formulas.
Strict Language Enforcement: Prevents accidental translation to English and ensures strict adherence to the target language.
Model Number Preservation: Automatically detects and preserves alphanumeric codes and model numbers (e.g., "STR-1650").
Comprehensive Reporting: Generates Excel and PDF reports with detailed evaluation metrics.
Bilingual Output: Generates a bilingual document (Original + Translation) with preserved indentation and formatting.

Workflow Overview

graph TD
    Start["Start: Input Document"] --> Extract["Extract Segments"]
    Extract --> Stage1["Stage 1: Initial Translation (Model A)"]
    Stage1 --> CheckSimple{"Is Simple Segment?"}
    CheckSimple -- Yes --> Skip["Skip Translation"]
    CheckSimple -- No --> CheckCache{"In Cache?"}
    CheckCache -- Yes --> UseCache["Use Cached Translation"]
    CheckCache -- No --> Translate["Call Model A"]
    Translate --> Stage1_5["Stage 1.5: Repair Checks"]
    UseCache --> Stage1_5
    
    Stage1_5 --> CheckFail{"Translation == Original?"}
    CheckFail -- Yes --> Repair["Retry Translation"]
    CheckFail -- No --> Stage2
    Repair --> Stage2["Stage 2: Evaluation 1 (Model B)"]
    Skip --> Stage2
    
    Stage2 --> EvalA["Evaluate Model A"]
    EvalA --> Stage3["Stage 3: Optimization (Model C)"]
    
    Stage3 --> CheckScore{"Score >= 9.5?"}
    CheckScore -- Yes --> SkipOpt["Skip Optimization"]
    CheckScore -- No --> Optimize["Call Model C with Suggestions"]
    
    SkipOpt --> Stage4
    Optimize --> Stage4["Stage 4: Comparative Evaluation (Model B)"]
    
    Stage4 --> EvalComp["Evaluate A vs C"]
    EvalComp --> Stage5["Stage 5: Selection"]
    
    Stage5 --> Select{"Score C > Score A?"}
    Select -- Yes --> FinalC["Select Model C"]
    Select -- No --> FinalA["Select Model A"]
    
    FinalC --> Output["Generate Documents & Reports"]
    FinalA --> Output
    
    style Start fill:#f9f,stroke:#333,stroke-width:2px
    style Output fill:#f9f,stroke:#333,stroke-width:2px

Installation

You can install the SDK directly using pip or uv:

# Using pip
pip install docu-fluent
 
# Using uv
uv pip install docu-fluent

Alternatively, for development:

# Clone the repository
git clone <repository-url>
cd translation-sdk
 
# Install dependencies
uv sync

Usage

Command Line Interface (CLI)

You can use the SDK directly from the command line.

# Set up your API key (if using OpenAI)
export OPENAI_API_KEY="your-api-key"

# Run translation with Azure OpenAI
uv run python -m docu_fluent input.docx \
    --output-dir output \
    --provider azure \
    --base-url https://your-resource.openai.azure.com/ \
    --api-key your-azure-key \
    --api-version 2023-05-15 \
    --model-translation gpt-35-turbo-deployment \
    --model-evaluation gpt-4-deployment \
    --model-optimization gpt-4-deployment \
    --source-lang auto \
    --target-lang "French"

Arguments:

input_file: Path to the .docx file to translate.
--output-dir: Directory to save the output files (default: output).
--provider: LLM provider to use (openai, azure, or mock). Default is mock.
--api-key: API key for the provider.
--base-url: Base URL (OpenAI) or Azure Endpoint (Azure).
--api-version: API Version (Azure only, e.g., 2023-05-15).
--model-translation: Model/Deployment for translation.
--model-evaluation: Model/Deployment for evaluation.
--model-optimization: Model/Deployment for optimization.
--source-lang: Source language (default: auto).
--target-lang: Target language (default: Chinese).
--config: Path to a JSON configuration file (e.g., model_config.json). If provided, model arguments are ignored.

Quick Start with Configuration File

For easier usage, you can configure your models in model_config.json and use the translate.py script.

Configure Models: Create or edit model_config.json:

{
    "translation_config": {
        "provider": "openai",
        "api_key": "sk-...",
        "base_url": "https://api.openai.com/v1",
        "model": "gpt-3.5-turbo"
    },
    "evaluation_config": {
        "provider": "openai",
        "api_key": "sk-...",
        "base_url": "https://api.openai.com/v1",
        "model": "gpt-4"
    },
    "optimization_config": {
        "provider": "openai",
        "api_key": "sk-...",
        "base_url": "https://api.openai.com/v1",
        "model": "gpt-4"
    },
    "concurrency_config": {
        "translation": 32,
        "evaluation_1": 32,
        "optimization": 32,
        "evaluation_2": 32
    }
}

Run Translation:

uv run python -m docu_fluent input.docx --config model_config.json --target-lang "Chinese"

Web Interface (GUI)

You can launch a user-friendly web interface to run translations without using the command line arguments.

Launch Command:

# Using the test script
uv run python translate.py --gui

# OR using the main SDK module (if installed)
uv run python -m docu_fluent --gui

Features:

File Upload: Drag and drop .docx files.
Language Selection: Select source (auto-detect supported) and target languages from a dropdown.
Settings (JSON): Directly edit the model_config.json content in the "Settings" tab to configure:
- Models: Provider, API Key, Base URL, and Model Name for each stage (Translation, Evaluation, Optimization).
- Concurrency: Number of parallel workers for each stage.
Progress Tracking: Real-time progress bars for translation, evaluation, and optimization stages.
Downloads: Download all generated files (Translated Doc, Bilingual Doc, Reports, JSON Logs) directly from the browser.

Python SDK

You can also use the SDK in your Python code.

from docu_fluent.sdk import TranslationSDK

# Initialize SDK with specific configurations for each model
# This allows using different providers/models for each step
# Initialize SDK with specific configurations for each model
# This allows using different providers/models for each step
sdk = TranslationSDK(
    translation_config={
        "provider": "openai",
        "api_key": "key-for-provider-1",
        "base_url": "https://api.provider1.com/v1",
        "model": "model-name-1"
    },
    evaluation_config={
        "provider": "openai",
        "api_key": "key-for-provider-2",
        "base_url": "https://api.provider2.com/v1",
        "model": "model-name-2"
    },
    optimization_config={
        "provider": "azure",
        "api_key": "azure-key",
        "base_url": "https://your-resource.openai.azure.com/",
        "api_version": "2023-05-15",
        "model": "gpt-4-deployment"
    }
)

# Translate a document
sdk.translate_document(
    "path/to/document.docx", 
    output_dir="output",
    source_lang="English",
    target_lang="Spanish"
)

Regenerating Documents

If you need to make manual corrections, you can edit the generated Excel report ({filename}_report.xlsx) and then regenerate the documents.

Open the Excel report.
(Optional) Add a column named final_translation with your corrected text.
Run the regeneration script:

uv run python regenerate_docs.py \
    --input-docx "path/to/original.docx" \
    --input-excel "path/to/edited_report.xlsx" \
    --output-dir "output_folder"

This will generate:

{filename}_regenerated_translated.docx
{filename}_regenerated_bilingual.docx

Output Files

The SDK generates the following files in the output directory:

{filename}_translated.docx: The fully translated document.
{filename}_bilingual.docx: A document with both original and translated text.
{filename}_report.xlsx: An Excel file containing detailed scores for each segment across 5 dimensions.
{filename}_report.pdf: A PDF summary of the translation quality.
{filename}_usage.json: Token usage statistics for the translation task.
{filename}_model_mapping.json: Mapping of model aliases (A, B, C) to actual model names used.
{filename}_results.json: Full detailed results including all intermediate steps and raw LLM responses.

Evaluation Dimensions

The translation is evaluated on 5 dimensions (0-10 score):

Accuracy: How accurately the meaning is conveyed.
Fluency: How natural the translation sounds.
Consistency: Consistency of terminology and style.
Terminology: Accuracy of specific domain terms.
Completeness: Whether all content is translated.

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPLv3).

See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.1

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docu_fluent-0.0.1.tar.gz (130.7 kB view details)

Uploaded Dec 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docu_fluent-0.0.1-py3-none-any.whl (36.6 kB view details)

Uploaded Dec 3, 2025 Python 3

File details

Details for the file docu_fluent-0.0.1.tar.gz.

File metadata

Download URL: docu_fluent-0.0.1.tar.gz
Upload date: Dec 3, 2025
Size: 130.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for docu_fluent-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`71b3b41dafaa590f21b1a629680a0bcfc0a0a49fb8c812680b4c9ddec397045c`
MD5	`f1ac1b69affb90e65b954a319707ef7a`
BLAKE2b-256	`35927072642b017c42d96f1e180bb9f23aa2d89407ff2d39bfd7f9f9b0a492be`

See more details on using hashes here.

File details

Details for the file docu_fluent-0.0.1-py3-none-any.whl.

File metadata

Download URL: docu_fluent-0.0.1-py3-none-any.whl
Upload date: Dec 3, 2025
Size: 36.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for docu_fluent-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b909ee6d443fa18cce0e606b74aa83ecd640bc6e07e1c5021ac8af9a8e24a2c9`
MD5	`4045d59a2bfa1c091eed6966e83fed49`
BLAKE2b-256	`8516cb4a3da7aa85a45528fbefbd146a395f5cafd63ebbff1d92bf3a69d95efe`

See more details on using hashes here.

docu-fluent 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Word Document Translation SDK

Features

Workflow Overview

Installation

Usage

Command Line Interface (CLI)

Quick Start with Configuration File

Web Interface (GUI)

Python SDK

Regenerating Documents

Output Files

Evaluation Dimensions

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes