Convert a PDF file into editable slides in .pptx format
Project description
Project Description
pdf2slides is an open-source Python library designed for efficient conversion of PDF files into editable slides. With just three lines of code, it supports both machine-generated PDF files with selectable text and scanned PDF files, although the latter is currently experimental. The library outputs slides in the .pptx format, making it a versatile tool for automating presentation creation and handling various types of PDF content. Ideal for integrating into applications or scripts, pdf2slides offers a straightforward solution for transforming PDF documents into PowerPoint presentations.
Installation
To install pdf2slides, use pip:
pip install pdf2slides
Usage
from pdf2slides import Converter
# Create an instance of Converter
converter = Converter()
# Convert a PDF file to slides
converter.convert('input_file.pdf', 'output_file.pptx')
Advanced Usage
You can customize the Converter instance with various parameters:
- Specifying a Default Font: Use this parameter to set a font for OCR'ed text. The font must be installed on your system.
from pdf2slides import Converter
converter = Converter(default_font='Arial')
converter.convert('input_file.pdf', 'output_file_with_arial_font.pptx')
- Enabling OCR Mode: Enable OCR for converting scanned PDFs. This feature is experimental and is not recommended for use with PDF files known to have selectable text.
from pdf2slides import Converter
converter = Converter(enable_ocr=True)
converter.convert('scanned_file.pdf', 'output_file.pptx')
- Enforcing Default Font: Ensure the output slides use the default font, even when the input file is not scanned.
from pdf2slides import Converter
converter = Converter(default_font='Arial', enforce_default_font=True)
converter.convert('not_scanned_file.pdf', 'output_file_with_arial_font.pptx')
- Manually Setting Image Retention Level: Adjust the threshold for keeping pure-text images when OCR is enabled.
from pdf2slides import Converter
# Set the image retention level to a lower value when non-editable text remains in the output as images.
converter = Converter(enable_ocr=True, image_retention_level=0.3)
converter.convert('scanned_file.pdf', 'output_file.pptx')
- Multilingual Support: Specify the language for OCR processing. The default setting supports English and Chinese. For a list of supported languages, see PaddleOCR Multi-language Model.
from pdf2slides import Converter
converter = Converter(enable_ocr=True, lang='fr')
converter.convert('scanned_file_in_french.pdf', 'output_file.pptx')
License
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pdf2slides-0.1.0-py3-none-any.whl
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 | 2bb52778808534eb7e0df3b134e75188b28e4ef1c8cb67f953957326f8462bd0 |
|
| MD5 | f6fc38329fef23c68d79c987ddf9452f |
|
| BLAKE2b-256 | fb25b7646ec35ec0dd6c4c3cf2e548073fef8c8139c411af3addee125ecacbb1 |