A PDF translator that preserves layout
Project description
PDFlator ๐๐
PDFlator is an application designed to translate PDF files while preserving their original layout. Crucially, the primary goal of this project is not just the translation functionality itself, but to serve as a practical demonstration of modular software design, SOLID principles, and common design patterns (like Factory). It aims to be extensible and maintainable.
โจ Features
- PDF Translation: Translate text content of PDF files.
- Layout Preservation: Maintains the original layout, including text positioning.
- Language Selection: Choose source/target languages, with auto-detection for the source.
- Multiple Translation Providers: Supports Google Translate and LibreTranslate (configurable API endpoint). Easily extendable with new providers.
- Language-Specific Handling: Adapts text alignment and bounding box resizing based on language characteristics (e.g., LTR vs. RTL). Extendable with new languages.
- Web Interface: User-friendly interface built with Flask.
- Dark Theme: Sleek dark theme with green accents.
- Configuration Page: Adjust translation parameters like font size, scaling, and redaction color via the UI (saved to
.env).
- Command Line Interface (CLI): Translate files directly from the terminal.
- Configuration via
.env: Manage settings like font size, scaling factor, redaction color, and the LibreTranslate API endpoint (usingLIBRE_TRANSLATE_API). - Installable Package: Install via pip for easy use.
- (Coming Soon) Translation History: View past translations.
๐ฏ Project Philosophy & Design
This project emphasizes:
- Modularity: Components (languages, translators) are designed as independent modules.
- Extensibility: Adding new languages or translation providers requires creating new classes that inherit from abstract base classes (
Language,Translator) without modifying core logic. - SOLID Principles: Adherence to principles like Single Responsibility and Open/Closed.
- Design Patterns: Utilizes patterns like the Factory Method (
LanguageFactory,TranslatorFactory) for object creation.
It serves as an example of building a maintainable application where functionality can be added or changed with minimal impact on existing code.
๐ ๏ธ Technologies Used
- Python: Core language.
- Flask: Web framework.
- PyMuPDF (fitz): PDF processing.
- googletrans: Google Translate API access (Note: can be unstable).
- libretranslatepy: LibreTranslate API access.
- python-dotenv: Environment variable management.
- Bootstrap: Frontend styling.
- Setuptools: Packaging.
๐ Getting Started
Prerequisites
- Python 3.7+
- Pip (Python package manager)
- Git (for cloning)
Installation
Option 1: Install as a Python Package (Recommended)
# Install from PyPI (if published)
# pip install pdflator
# Or for isolated installation (if published)
# pipx install pdflator
# Currently, install from source or use development mode
pip install git+https://github.com/your-username/PDFlator.git # Replace with actual URL if public
Option 2: Clone and Install Locally (Development)
-
Clone the repository:
git clone https://github.com/your-username/PDFlator.git # Replace with actual URL cd PDFlator
-
Create and activate a virtual environment (Recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install in development mode:
pip install -e . # Or use the script: ./install_dev.sh
-
Configure Environment (
.env): Create a.envfile in the project root (wheresetup.pyis located) with the following content:OUTPUT_FONT_SIZE=12 WHITE_COLOR=(1,1,1) SCALING_FACTOR=0.75 LIBRE_TRANSLATE_API=http://localhost:8000/
- Set
LIBRE_TRANSLATE_APIto the full URL of your LibreTranslate instance (e.g.,http://127.0.0.1:5000/). - Other values can be configured via the web UI's Configuration page.
- Set
-
Set up LibreTranslate (Optional): If using the LibreTranslate provider, ensure a LibreTranslate API server is running and accessible at the URL specified in
LIBRE_TRANSLATE_API. See the LibreTranslate repository.
๐ Usage
(Ensure your virtual environment is activated if installed locally)
Command Line Interface
PDFlator provides a unified CLI:
Directly Translate a PDF
# Basic translation (uses defaults from .env and code)
pdflator translate -i input.pdf -o output.pdf
# Specify languages and translator
pdflator translate -i input.pdf -o output.pdf -il fr -ol en -t gtrans
# Use LibreTranslate
pdflator translate -i input.pdf -o output.pdf -t libre
Parameters are detailed in pdflator translate --help
Start the Web Interface
# Start with default settings (http://127.0.0.1:5000)
pdflator web
# Specify host and port
pdflator web --host 0.0.0.0 --port 8080
# Run in debug mode
pdflator web --debug
Parameters are detailed in pdflator web --help
Other Commands
# Get version information
pdflator --version
# Display help for all commands
pdflator --help
Web Interface
- Run
pdflator web. - Open the provided URL (e.g.,
http://127.0.0.1:5000) in your browser. - Upload a PDF.
- Select languages and provider.
- Click "Translate".
- Download the result.
- Visit the "Configuration" page to adjust settings.
๐ Directory Structure
PDFlator/ (Project Root)
โโโ pdflator/ # Main package source code
โ โโโ __init__.py
โ โโโ main.py # CLI entry point logic
โ โโโ web.py # Flask web application logic
โ โโโ translate_pdf.py # Core PDF translation function
โ โโโ languages/ # Language-specific modules (e.g., alignment)
โ โ โโโ __init__.py
โ โ โโโ language.py # Abstract Base Class for Language
โ โ โโโ ... (english.py, arabic.py, etc.)
โ โโโ static/ # Static web assets (CSS, JS, images)
โ โ โโโ __init__.py
โ โ โโโ css/
โ โ โโโ style.css
โ โโโ templates/ # HTML templates for Flask
โ โ โโโ __init__.py
โ โ โโโ ... (index.html, result.html, etc.)
โ โโโ translation/ # Translation provider modules
โ โโโ __init__.py
โ โโโ translator.py # Abstract Base Class for Translator
โ โโโ ... (google_translator.py, libretranslate_translator.py, etc.)
โโโ .env # Environment variables (API URL, config) - *Not in Git*
โโโ .gitignore
โโโ MANIFEST.in # Specifies files to include in the package
โโโ README.md # This file
โโโ install_dev.sh # Helper script for development install
โโโ pyproject.toml # Build system requirements & tool config (Black, isort)
โโโ requirements.txt # List of dependencies (can be generated from setup.py)
โโโ setup.py # Package build and installation script
โโโ venv/ # Virtual environment directory - *Not in Git*
๐ค Contributing
Contributions focusing on improving modularity, adding well-designed features, or enhancing demonstrations of design principles are welcome! Please open an issue first to discuss changes.
๐ License
MIT License. See the LICENSE file (if included) or standard MIT terms.
๐ Acknowledgments
Happy Translating & Coding! ๐๐ป
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdflator-1.0.tar.gz.
File metadata
- Download URL: pdflator-1.0.tar.gz
- Upload date:
- Size: 125.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7dd433d461ed47ba35268594308e1a5779ec11cc0192e5e6b26db543909eba2c
|
|
| MD5 |
cafbdd3c919c77c958c85847f838172f
|
|
| BLAKE2b-256 |
3eda928ca6ff6efac28ced7d7871419b55608c64d71aa82106505c3659af19e2
|
File details
Details for the file pdflator-1.0-py3-none-any.whl.
File metadata
- Download URL: pdflator-1.0-py3-none-any.whl
- Upload date:
- Size: 127.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09780338daca80f7b2bb47f0cbe805c254ee32b5045d45e46cd3accc8d076b52
|
|
| MD5 |
9c0639ba52a3e13369d14600e44b0059
|
|
| BLAKE2b-256 |
d14235f90533d1b9182179f2ba7c4d4721d4ed8c15e0c1955fbc8cae37cfb432
|