Skip to main content

Convert a PDF file into editable slides in .pptx format

Project description

Project Description

pdf2slides is an open-source Python library designed for efficient conversion of PDF files into editable slides. With just three lines of code, it supports both machine-generated PDF files with selectable text and scanned PDF files, although the latter is currently experimental. The library outputs slides in the .pptx format, making it a versatile tool for automating presentation creation and handling various types of PDF content. Ideal for integrating into applications or scripts, pdf2slides offers a straightforward solution for transforming PDF documents into PowerPoint presentations.

Installation

To install pdf2slides, use pip:

pip install pdf2slides

Usage

from pdf2slides import Converter

# Create an instance of Converter
converter = Converter()

# Convert a PDF file to slides
converter.convert('input_file.pdf', 'output_file.pptx')

Advanced Usage

You can customize the Converter instance with various parameters:

  1. Specifying a Default Font: Use this parameter to set a font for OCR'ed text. The font must be installed on your system.
from pdf2slides import Converter

converter = Converter(default_font='Arial')
converter.convert('input_file.pdf', 'output_file_with_arial_font.pptx')
  1. Enabling OCR Mode: Enable OCR for converting scanned PDFs. This feature is experimental and is not recommended for use with PDF files known to have selectable text.
from pdf2slides import Converter

converter = Converter(enable_ocr=True)
converter.convert('scanned_file.pdf', 'output_file.pptx')
  1. Enforcing Default Font: Ensure the output slides use the default font, even when the input file is not scanned.
from pdf2slides import Converter

converter = Converter(default_font='Arial', enforce_default_font=True)
converter.convert('not_scanned_file.pdf', 'output_file_with_arial_font.pptx')
  1. Manually Setting Image Retention Level: Adjust the threshold for keeping pure-text images when OCR is enabled.
from pdf2slides import Converter

# Set the image retention level to a lower value when non-editable text remains in the output as images.
converter = Converter(enable_ocr=True, image_retention_level=0.3)
converter.convert('scanned_file.pdf', 'output_file.pptx')
  1. Multilingual Support: Specify the language for OCR processing. The default setting supports English and Chinese. For a list of supported languages, see PaddleOCR Multi-language Model.
from pdf2slides import Converter

converter = Converter(enable_ocr=True, lang='fr')
converter.convert('scanned_file_in_french.pdf', 'output_file.pptx')

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2slides-0.1.0.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

pdf2slides-0.1.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file pdf2slides-0.1.0.tar.gz.

File metadata

  • Download URL: pdf2slides-0.1.0.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for pdf2slides-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6308ef8017076f73b22028b2da9a820853cf84601d750d1f3b4e843407a0ae0c
MD5 23da720fbb5c807823d616e3337c794a
BLAKE2b-256 ff988d782a374f8b236f2b89bfece8afbacf5ceb4b4ac1b3c28b32e5527667a9

See more details on using hashes here.

File details

Details for the file pdf2slides-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdf2slides-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for pdf2slides-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2bb52778808534eb7e0df3b134e75188b28e4ef1c8cb67f953957326f8462bd0
MD5 f6fc38329fef23c68d79c987ddf9452f
BLAKE2b-256 fb25b7646ec35ec0dd6c4c3cf2e548073fef8c8139c411af3addee125ecacbb1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page