Advanced PDF to PowerPoint converter with high fidelity
Project description
PDFToPPT - Advanced PDF to PowerPoint Converter
A high-fidelity Python package for converting PDF documents to PowerPoint presentations. This tool preserves layouts, images, text formatting, and vector graphics with exceptional accuracy.
Features
- High Fidelity Conversion: Preserves original PDF layouts, fonts, colors, and formatting
- Vector Graphics Support: Converts PDF vector elements (lines, rectangles) to PowerPoint shapes
- Image Preservation: Extracts and embeds images with transparency support
- Text Formatting: Maintains font styles, sizes, colors, bold, and italic formatting
- Custom Page Ranges: Convert specific pages or page ranges
- Batch Processing: Process multiple pages efficiently
- Command Line Interface: Easy-to-use CLI for automation
- Python API: Full programmatic access for integration
Installation
From PyPI (Recommended)
pip install pdftoppt
From Source
git clone https://github.com/amitpanda007/pdftoppt.git
cd pdftoppt
pip install -e .
Dependencies
- Python 3.7+
- PyMuPDF (fitz) >= 1.20.0
- python-pptx >= 0.6.18
- Pillow >= 8.0.0
Quick Start
Command Line Usage
# Convert entire PDF
pdftoppt input.pdf output.pptx
# Convert specific page range
pdftoppt input.pdf output.pptx --pages 1-5
# Convert with verbose logging
pdftoppt input.pdf output.pptx --verbose
# Get help
pdftoppt --help
Python API Usage
from pdftoppt import AdvancedPDFToPowerPointConverter
# Basic conversion
with AdvancedPDFToPowerPointConverter() as converter:
success = converter.convert("input.pdf", "output.pptx")
print(f"Conversion successful: {success}")
print(f"Slides created: {converter.slides_created}")
# Convert specific pages
with AdvancedPDFToPowerPointConverter() as converter:
success = converter.convert(
pdf_path="input.pdf",
output_path="output.pptx",
page_range=(1, 5) # Convert pages 1-5
)
# With error handling
try:
converter = AdvancedPDFToPowerPointConverter()
converter.convert("input.pdf", "output.pptx")
except FileNotFoundError:
print("PDF file not found")
except ValueError as e:
print(f"Invalid parameters: {e}")
finally:
converter._cleanup_temp_files()
Advanced Usage
Logging Configuration
import logging
from pdftoppt import AdvancedPDFToPowerPointConverter
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
converter = AdvancedPDFToPowerPointConverter()
converter.convert("input.pdf", "output.pptx")
Batch Processing
import os
from pathlib import Path
from pdftoppt import AdvancedPDFToPowerPointConverter
def batch_convert_pdfs(input_dir, output_dir):
\"\"\"Convert all PDFs in a directory.\"\"\"
input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
with AdvancedPDFToPowerPointConverter() as converter:
for pdf_file in input_path.glob("*.pdf"):
output_file = output_path / f"{pdf_file.stem}.pptx"
try:
success = converter.convert(str(pdf_file), str(output_file))
print(f"{'✓' if success else '✗'} {pdf_file.name}")
except Exception as e:
print(f"✗ {pdf_file.name}: {e}")
# Usage
batch_convert_pdfs("./pdfs", "./presentations")
How It Works
The converter uses a multi-step process to ensure high-fidelity conversion:
- PDF Analysis: Extracts text, images, and vector graphics from each PDF page
- Element Processing: Processes fonts, colors, positioning, and formatting
- PowerPoint Generation: Creates custom-sized presentation matching PDF dimensions
- Content Reconstruction: Rebuilds all elements as native PowerPoint objects
Supported Elements
- ✅ Text with formatting (fonts, sizes, colors, bold, italic)
- ✅ Images (JPEG, PNG) with transparency support
- ✅ Vector graphics (rectangles, lines)
- ✅ Colors and fill patterns
- ✅ Positioning and layouts
- ⚠️ Complex vector paths (simplified to basic shapes)
- ❌ Interactive elements (forms, hyperlinks)
- ❌ Animations and transitions
Performance
Typical conversion speeds:
- Simple text documents: ~2-5 pages/second
- Image-heavy documents: ~0.5-2 pages/second
- Complex mixed content: ~1-3 pages/second
Memory usage scales with document complexity and image content.
Troubleshooting
Common Issues
Import Error for PyMuPDF:
pip install --upgrade PyMuPDF
Memory issues with large PDFs:
# Process in smaller page ranges
for start in range(1, total_pages, 10):
end = min(start + 9, total_pages)
converter.convert("large.pdf", f"output_part_{start}.pptx",
page_range=(start, end))
Font rendering issues:
- Ensure system has required fonts installed
- Check PDF for embedded fonts
Debug Mode
Enable verbose logging to diagnose issues:
pdftoppt input.pdf output.pptx --verbose
API Reference
AdvancedPDFToPowerPointConverter
Methods
__init__()
- Initializes converter with temporary directory for processing
convert(pdf_path, output_path, page_range=None)
- Main conversion method
- Parameters:
pdf_path(str): Path to input PDF fileoutput_path(str): Path for output PowerPoint filepage_range(tuple, optional): (start_page, end_page) for partial conversion
- Returns: bool - True if successful
- Raises: FileNotFoundError, ValueError
Context Manager Support:
with AdvancedPDFToPowerPointConverter() as converter:
converter.convert("input.pdf", "output.pptx")
# Automatic cleanup
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
git clone https://github.com/amitpanda007/pdftoppt.git
cd pdftoppt
pip install -e ".[dev]"
Running Tests
pytest tests/
Code Quality
black pdftoppt/
flake8 pdftoppt/
mypy pdftoppt/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
v1.0.0
- Initial release
- High-fidelity PDF to PowerPoint conversion
- Support for text, images, and vector graphics
- Command-line interface
- Python API with context manager support
Support
Acknowledgments
- Built with PyMuPDF for PDF processing
- Uses python-pptx for PowerPoint generation
- Image processing powered by Pillow
Made with ❤️ for the Python community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdftoppt-1.0.0.tar.gz.
File metadata
- Download URL: pdftoppt-1.0.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53a811c00dd7cf1a323c1c26e533d457ab54c8876e04062f869e9e342add7340
|
|
| MD5 |
4e944beab5b9db594bd9c28a95241bfc
|
|
| BLAKE2b-256 |
b379198349649027c7896d700adca3e0eeb9977e5ce85128ee61e67231747150
|
File details
Details for the file pdftoppt-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pdftoppt-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9745da32ba08a8a1be0056f2e57f00d32435561de55a78b25a0f66e95d41816
|
|
| MD5 |
25e8b69420446439c01455bcaebfcb3c
|
|
| BLAKE2b-256 |
eb247b9994e5edc5d394fcd4be8c91949c3b990febfe407d345b62be52c9f00a
|