Convert a PDF file into editable slides in .pptx format
Project description
Project Description
pdf2slides
is an open-source Python library designed for efficient conversion of PDF files into editable slides. With just three lines of code, it supports both machine-generated PDF files with selectable text and scanned PDF files, although the latter is currently experimental. The library outputs slides in the .pptx
format, making it a versatile tool for automating presentation creation and handling various types of PDF content. Ideal for integrating into applications or scripts, pdf2slides
offers a straightforward solution for transforming PDF documents into PowerPoint presentations.
Installation
To install pdf2slides
, use pip:
pip install pdf2slides
Usage
from pdf2slides import Converter
# Create an instance of Converter
converter = Converter()
# Convert a PDF file to slides
converter.convert('input_file.pdf', 'output_file.pptx')
Advanced Usage
You can customize the Converter
instance with various parameters:
- Specifying a Default Font: Use this parameter to set a font for OCR'ed text. The font must be installed on your system.
from pdf2slides import Converter
converter = Converter(default_font='Arial')
converter.convert('input_file.pdf', 'output_file_with_arial_font.pptx')
- Enabling OCR Mode: Enable OCR for converting scanned PDFs. This feature is experimental and is not recommended for use with PDF files known to have selectable text.
from pdf2slides import Converter
converter = Converter(enable_ocr=True)
converter.convert('scanned_file.pdf', 'output_file.pptx')
- Enforcing Default Font: Ensure the output slides use the default font, even when the input file is not scanned.
from pdf2slides import Converter
converter = Converter(default_font='Arial', enforce_default_font=True)
converter.convert('not_scanned_file.pdf', 'output_file_with_arial_font.pptx')
- Manually Setting Image Retention Level: Adjust the threshold for keeping pure-text images when OCR is enabled.
from pdf2slides import Converter
# Set the image retention level to a lower value when non-editable text remains in the output as images.
converter = Converter(enable_ocr=True, image_retention_level=0.3)
converter.convert('scanned_file.pdf', 'output_file.pptx')
- Multilingual Support: Specify the language for OCR processing. The default setting supports English and Chinese. For a list of supported languages, see PaddleOCR Multi-language Model.
from pdf2slides import Converter
converter = Converter(enable_ocr=True, lang='fr')
converter.convert('scanned_file_in_french.pdf', 'output_file.pptx')
License
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdf2slides-0.1.0.tar.gz
.
File metadata
- Download URL: pdf2slides-0.1.0.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6308ef8017076f73b22028b2da9a820853cf84601d750d1f3b4e843407a0ae0c |
|
MD5 | 23da720fbb5c807823d616e3337c794a |
|
BLAKE2b-256 | ff988d782a374f8b236f2b89bfece8afbacf5ceb4b4ac1b3c28b32e5527667a9 |
File details
Details for the file pdf2slides-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pdf2slides-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bb52778808534eb7e0df3b134e75188b28e4ef1c8cb67f953957326f8462bd0 |
|
MD5 | f6fc38329fef23c68d79c987ddf9452f |
|
BLAKE2b-256 | fb25b7646ec35ec0dd6c4c3cf2e548073fef8c8139c411af3addee125ecacbb1 |