Skip to main content

Advanced PDF to PowerPoint converter with high fidelity

Project description

PDFToPPT - Advanced PDF to PowerPoint Converter

PyPI version Python Support License: MIT

A high-fidelity Python package for converting PDF documents to PowerPoint presentations. This tool preserves layouts, images, text formatting, and vector graphics with exceptional accuracy.

Features

  • High Fidelity Conversion: Preserves original PDF layouts, fonts, colors, and formatting
  • Vector Graphics Support: Converts PDF vector elements (lines, rectangles) to PowerPoint shapes
  • Image Preservation: Extracts and embeds images with transparency support
  • Text Formatting: Maintains font styles, sizes, colors, bold, and italic formatting
  • Custom Page Ranges: Convert specific pages or page ranges
  • Batch Processing: Process multiple pages efficiently
  • Command Line Interface: Easy-to-use CLI for automation
  • Python API: Full programmatic access for integration

Installation

From PyPI (Recommended)

pip install pdftoppt

From Source

git clone https://github.com/amitpanda007/pdftoppt.git
cd pdftoppt
pip install -e .

Dependencies

  • Python 3.7+
  • PyMuPDF (fitz) >= 1.20.0
  • python-pptx >= 0.6.18
  • Pillow >= 8.0.0

Quick Start

Command Line Usage

# Convert entire PDF
pdftoppt input.pdf output.pptx

# Convert specific page range
pdftoppt input.pdf output.pptx --pages 1-5

# Convert with verbose logging
pdftoppt input.pdf output.pptx --verbose

# Get help
pdftoppt --help

Python API Usage

from pdftoppt import AdvancedPDFToPowerPointConverter

# Basic conversion
with AdvancedPDFToPowerPointConverter() as converter:
    success = converter.convert("input.pdf", "output.pptx")
    print(f"Conversion successful: {success}")
    print(f"Slides created: {converter.slides_created}")

# Convert specific pages
with AdvancedPDFToPowerPointConverter() as converter:
    success = converter.convert(
        pdf_path="input.pdf",
        output_path="output.pptx",
        page_range=(1, 5)  # Convert pages 1-5
    )

# With error handling
try:
    converter = AdvancedPDFToPowerPointConverter()
    converter.convert("input.pdf", "output.pptx")
except FileNotFoundError:
    print("PDF file not found")
except ValueError as e:
    print(f"Invalid parameters: {e}")
finally:
    converter._cleanup_temp_files()

Advanced Usage

Logging Configuration

import logging
from pdftoppt import AdvancedPDFToPowerPointConverter

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

converter = AdvancedPDFToPowerPointConverter()
converter.convert("input.pdf", "output.pptx")

Batch Processing

import os
from pathlib import Path
from pdftoppt import AdvancedPDFToPowerPointConverter

def batch_convert_pdfs(input_dir, output_dir):
    \"\"\"Convert all PDFs in a directory.\"\"\"
    input_path = Path(input_dir)
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)

    with AdvancedPDFToPowerPointConverter() as converter:
        for pdf_file in input_path.glob("*.pdf"):
            output_file = output_path / f"{pdf_file.stem}.pptx"
            try:
                success = converter.convert(str(pdf_file), str(output_file))
                print(f"{'✓' if success else '✗'} {pdf_file.name}")
            except Exception as e:
                print(f"✗ {pdf_file.name}: {e}")

# Usage
batch_convert_pdfs("./pdfs", "./presentations")

How It Works

The converter uses a multi-step process to ensure high-fidelity conversion:

  1. PDF Analysis: Extracts text, images, and vector graphics from each PDF page
  2. Element Processing: Processes fonts, colors, positioning, and formatting
  3. PowerPoint Generation: Creates custom-sized presentation matching PDF dimensions
  4. Content Reconstruction: Rebuilds all elements as native PowerPoint objects

Supported Elements

  • ✅ Text with formatting (fonts, sizes, colors, bold, italic)
  • ✅ Images (JPEG, PNG) with transparency support
  • ✅ Vector graphics (rectangles, lines)
  • ✅ Colors and fill patterns
  • ✅ Positioning and layouts
  • ⚠️ Complex vector paths (simplified to basic shapes)
  • ❌ Interactive elements (forms, hyperlinks)
  • ❌ Animations and transitions

Performance

Typical conversion speeds:

  • Simple text documents: ~2-5 pages/second
  • Image-heavy documents: ~0.5-2 pages/second
  • Complex mixed content: ~1-3 pages/second

Memory usage scales with document complexity and image content.

Troubleshooting

Common Issues

Import Error for PyMuPDF:

pip install --upgrade PyMuPDF

Memory issues with large PDFs:

# Process in smaller page ranges
for start in range(1, total_pages, 10):
    end = min(start + 9, total_pages)
    converter.convert("large.pdf", f"output_part_{start}.pptx",
                     page_range=(start, end))

Font rendering issues:

  • Ensure system has required fonts installed
  • Check PDF for embedded fonts

Debug Mode

Enable verbose logging to diagnose issues:

pdftoppt input.pdf output.pptx --verbose

API Reference

AdvancedPDFToPowerPointConverter

Methods

__init__()

  • Initializes converter with temporary directory for processing

convert(pdf_path, output_path, page_range=None)

  • Main conversion method
  • Parameters:
    • pdf_path (str): Path to input PDF file
    • output_path (str): Path for output PowerPoint file
    • page_range (tuple, optional): (start_page, end_page) for partial conversion
  • Returns: bool - True if successful
  • Raises: FileNotFoundError, ValueError

Context Manager Support:

with AdvancedPDFToPowerPointConverter() as converter:
    converter.convert("input.pdf", "output.pptx")
# Automatic cleanup

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/amitpanda007/pdftoppt.git
cd pdftoppt
pip install -e ".[dev]"

Running Tests

pytest tests/

Code Quality

black pdftoppt/
flake8 pdftoppt/
mypy pdftoppt/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

v1.0.0

  • Initial release
  • High-fidelity PDF to PowerPoint conversion
  • Support for text, images, and vector graphics
  • Command-line interface
  • Python API with context manager support

Support

Acknowledgments


Made with ❤️ for the Python community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdftoppt-1.0.0.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdftoppt-1.0.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file pdftoppt-1.0.0.tar.gz.

File metadata

  • Download URL: pdftoppt-1.0.0.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.9

File hashes

Hashes for pdftoppt-1.0.0.tar.gz
Algorithm Hash digest
SHA256 53a811c00dd7cf1a323c1c26e533d457ab54c8876e04062f869e9e342add7340
MD5 4e944beab5b9db594bd9c28a95241bfc
BLAKE2b-256 b379198349649027c7896d700adca3e0eeb9977e5ce85128ee61e67231747150

See more details on using hashes here.

File details

Details for the file pdftoppt-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pdftoppt-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.9

File hashes

Hashes for pdftoppt-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d9745da32ba08a8a1be0056f2e57f00d32435561de55a78b25a0f66e95d41816
MD5 25e8b69420446439c01455bcaebfcb3c
BLAKE2b-256 eb247b9994e5edc5d394fcd4be8c91949c3b990febfe407d345b62be52c9f00a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page