Skip to main content

A PDF translator that preserves layout

Project description

PDFlator ๐Ÿ“„๐ŸŒ

PDFlator Logo

PDFlator is an application designed to translate PDF files while preserving their original layout. Crucially, the primary goal of this project is not just the translation functionality itself, but to serve as a practical demonstration of modular software design, SOLID principles, and common design patterns (like Factory). It aims to be extensible and maintainable.

โœจ Features

  • PDF Translation: Translate text content of PDF files.
  • Layout Preservation: Maintains the original layout, including text positioning.
  • Language Selection: Choose source/target languages, with auto-detection for the source.
  • Multiple Translation Providers: Supports Google Translate and LibreTranslate (configurable API endpoint). Easily extendable with new providers.
  • Language-Specific Handling: Adapts text alignment and bounding box resizing based on language characteristics (e.g., LTR vs. RTL). Extendable with new languages.
  • Web Interface: User-friendly interface built with Flask.
    • Dark Theme: Sleek dark theme with green accents.
    • Configuration Page: Adjust translation parameters like font size, scaling, and redaction color via the UI (saved to .env).
  • Command Line Interface (CLI): Translate files directly from the terminal.
  • Configuration via .env: Manage settings like font size, scaling factor, redaction color, and the LibreTranslate API endpoint (using LIBRE_TRANSLATE_API).
  • Installable Package: Install via pip for easy use.
  • (Coming Soon) Translation History: View past translations.

๐ŸŽฏ Project Philosophy & Design

This project emphasizes:

  • Modularity: Components (languages, translators) are designed as independent modules.
  • Extensibility: Adding new languages or translation providers requires creating new classes that inherit from abstract base classes (Language, Translator) without modifying core logic.
  • SOLID Principles: Adherence to principles like Single Responsibility and Open/Closed.
  • Design Patterns: Utilizes patterns like the Factory Method (LanguageFactory, TranslatorFactory) for object creation.

It serves as an example of building a maintainable application where functionality can be added or changed with minimal impact on existing code.

๐Ÿ› ๏ธ Technologies Used

  • Python: Core language.
  • Flask: Web framework.
  • PyMuPDF (fitz): PDF processing.
  • googletrans: Google Translate API access (Note: can be unstable).
  • libretranslatepy: LibreTranslate API access.
  • python-dotenv: Environment variable management.
  • Bootstrap: Frontend styling.
  • Setuptools: Packaging.

๐Ÿš€ Getting Started

Prerequisites

  • Python 3.7+
  • Pip (Python package manager)
  • Git (for cloning)

Installation

Option 1: Install as a Python Package (Recommended)

# Install from PyPI (if published)
# pip install pdflator

# Or for isolated installation (if published)
# pipx install pdflator

# Currently, install from source or use development mode
pip install git+https://github.com/your-username/PDFlator.git # Replace with actual URL if public

Option 2: Clone and Install Locally (Development)

  1. Clone the repository:

    git clone https://github.com/your-username/PDFlator.git # Replace with actual URL
    cd PDFlator
    
  2. Create and activate a virtual environment (Recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    
  3. Install in development mode:

    pip install -e .
    # Or use the script: ./install_dev.sh
    
  4. Configure Environment (.env): Create a .env file in the project root (where setup.py is located) with the following content:

    OUTPUT_FONT_SIZE=12
    WHITE_COLOR=(1,1,1)
    SCALING_FACTOR=0.75
    LIBRE_TRANSLATE_API=http://localhost:8000/
    
    • Set LIBRE_TRANSLATE_API to the full URL of your LibreTranslate instance (e.g., http://127.0.0.1:5000/).
    • Other values can be configured via the web UI's Configuration page.
  5. Set up LibreTranslate (Optional): If using the LibreTranslate provider, ensure a LibreTranslate API server is running and accessible at the URL specified in LIBRE_TRANSLATE_API. See the LibreTranslate repository.

๐Ÿ“– Usage

(Ensure your virtual environment is activated if installed locally)

Command Line Interface

PDFlator provides a unified CLI:

Directly Translate a PDF

# Basic translation (uses defaults from .env and code)
pdflator translate -i input.pdf -o output.pdf

# Specify languages and translator
pdflator translate -i input.pdf -o output.pdf -il fr -ol en -t gtrans

# Use LibreTranslate
pdflator translate -i input.pdf -o output.pdf -t libre

Parameters are detailed in pdflator translate --help

Start the Web Interface

# Start with default settings (http://127.0.0.1:5000)
pdflator web

# Specify host and port
pdflator web --host 0.0.0.0 --port 8080

# Run in debug mode
pdflator web --debug

Parameters are detailed in pdflator web --help

Other Commands

# Get version information
pdflator --version

# Display help for all commands
pdflator --help

Web Interface

  1. Run pdflator web.
  2. Open the provided URL (e.g., http://127.0.0.1:5000) in your browser.
  3. Upload a PDF.
  4. Select languages and provider.
  5. Click "Translate".
  6. Download the result.
  7. Visit the "Configuration" page to adjust settings.

๐Ÿ“‚ Directory Structure

PDFlator/ (Project Root)
โ”œโ”€โ”€ pdflator/              # Main package source code
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ main.py            # CLI entry point logic
โ”‚   โ”œโ”€โ”€ web.py             # Flask web application logic
โ”‚   โ”œโ”€โ”€ translate_pdf.py   # Core PDF translation function
โ”‚   โ”œโ”€โ”€ languages/         # Language-specific modules (e.g., alignment)
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ language.py    # Abstract Base Class for Language
โ”‚   โ”‚   โ””โ”€โ”€ ... (english.py, arabic.py, etc.)
โ”‚   โ”œโ”€โ”€ static/            # Static web assets (CSS, JS, images)
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ css/
โ”‚   โ”‚       โ””โ”€โ”€ style.css
โ”‚   โ”œโ”€โ”€ templates/         # HTML templates for Flask
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ ... (index.html, result.html, etc.)
โ”‚   โ””โ”€โ”€ translation/       # Translation provider modules
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ translator.py  # Abstract Base Class for Translator
โ”‚       โ””โ”€โ”€ ... (google_translator.py, libretranslate_translator.py, etc.)
โ”œโ”€โ”€ .env                   # Environment variables (API URL, config) - *Not in Git*
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ MANIFEST.in            # Specifies files to include in the package
โ”œโ”€โ”€ README.md              # This file
โ”œโ”€โ”€ install_dev.sh         # Helper script for development install
โ”œโ”€โ”€ pyproject.toml         # Build system requirements & tool config (Black, isort)
โ”œโ”€โ”€ requirements.txt       # List of dependencies (can be generated from setup.py)
โ”œโ”€โ”€ setup.py               # Package build and installation script
โ””โ”€โ”€ venv/                  # Virtual environment directory - *Not in Git*

๐Ÿค Contributing

Contributions focusing on improving modularity, adding well-designed features, or enhancing demonstrations of design principles are welcome! Please open an issue first to discuss changes.

๐Ÿ“œ License

MIT License. See the LICENSE file (if included) or standard MIT terms.

๐ŸŒŸ Acknowledgments


Happy Translating & Coding! ๐ŸŒ๐Ÿ’ป

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdflator-1.0.tar.gz (125.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdflator-1.0-py3-none-any.whl (127.6 kB view details)

Uploaded Python 3

File details

Details for the file pdflator-1.0.tar.gz.

File metadata

  • Download URL: pdflator-1.0.tar.gz
  • Upload date:
  • Size: 125.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for pdflator-1.0.tar.gz
Algorithm Hash digest
SHA256 7dd433d461ed47ba35268594308e1a5779ec11cc0192e5e6b26db543909eba2c
MD5 cafbdd3c919c77c958c85847f838172f
BLAKE2b-256 3eda928ca6ff6efac28ced7d7871419b55608c64d71aa82106505c3659af19e2

See more details on using hashes here.

File details

Details for the file pdflator-1.0-py3-none-any.whl.

File metadata

  • Download URL: pdflator-1.0-py3-none-any.whl
  • Upload date:
  • Size: 127.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for pdflator-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09780338daca80f7b2bb47f0cbe805c254ee32b5045d45e46cd3accc8d076b52
MD5 9c0639ba52a3e13369d14600e44b0059
BLAKE2b-256 d14235f90533d1b9182179f2ba7c4d4721d4ed8c15e0c1955fbc8cae37cfb432

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page