Comprehensive PDF manipulation and conversion toolkit designed for RPA (Robotic Process Automation) workflows.

These details have not been verified by PyPI

Project links

Homepage

Project description

PDF documents operations library

RPA Automation Robocorp Compatible Python 3.10+ Coverage 85% Security: Bandit Ruff CI Testing

Python module that delivers a comprehensive set of actions to manipulate and convert files to PDF. It is specifically designed to be easily integrated into Robotic Process Automation (RPA) workflows.

The module is natively compatible with Robocorp and provides streamlined abstractions over complex PDF tasks.

Business Capabilities

rpa-pdf accelerates your automation projects by providing out-of-the-box features for:

Universal Document Conversion: Convert Office documents (Word, Excel, PowerPoint), HTML, Text, Emails (.msg, .eml), and Images into standard PDF formats without manual intervention.
Document Manipulation: Merge, compress, split, and rotate PDF files.
Content Extraction: Rip raw text or embedded images directly out of PDF pages.
Compliance & Stamping: Automatically stamp PDFs with text (watermarks) or Code39 barcodes (useful for invoice tracking, archiving, and indexing).
Automated Printing: Send PDFs directly to physical or virtual printers without UI prompts.

Architecture & Structure

The package is built with a modular architecture leveraging robust Python libraries (pypdf, fpdf2, Pillow, comtypes, xhtml2pdf, extract-msg) abstracted behind a simple, RPA-friendly API.

rpa_pdf.Pdf: The core facade exposing primary manipulation methods (merge, compress, print) and text-to-PDF capabilities.
rpa_pdf.Stamp: Utilities focused entirely on adding text and barcode overlays onto existing documents.
rpa_pdf.Converter: A universal registry-based format converter that dynamically proxies out to appropriate engines based on file extensions.
rpa_pdf.TextExtractor & rpa_pdf.ImageExtractor: Extract embedded content from PDFs.
rpa_pdf.PdfSplitter & rpa_pdf.PdfRotator: Handle page-level PDF mutations.
Bundled Executables: Bundles necessary fonts and lightweight viewers (like SumatraPDF) to guarantee predictable cross-environment execution, especially for silent printing operations.

Installation

We recommend using a modern package manager like uv or standard pip.

# Using pip
pip install rpa-pdf

# Using uv
uv pip install rpa-pdf

Examples

1. Universal File Conversions (Word/Excel/PowerPoint/HTML/Email/Images)

The simplest way to convert any supported format into a PDF is using the universal Converter class. (Note: Office conversion requires Microsoft Office to be installed on the machine)

from rpa_pdf import Converter

converter = Converter()

# Converts Word documents
converter.convert('c:/temp/document.docx', 'c:/temp/document.pdf')

# Converts Excel spreadsheets
converter.convert('c:/temp/data.xlsx') # Defaults to c:/temp/data.xlsx.pdf

# Converts Emails (.eml, .msg)
converter.convert('c:/temp/email.msg', 'c:/temp/email.pdf')

# Converts HTML & Images
converter.convert('c:/temp/index.html', 'c:/temp/website.pdf')
converter.convert('c:/temp/scan.png', 'c:/temp/scan.pdf')

2. Merging PDF Files

Combine multiple invoice documents into one package.

from rpa_pdf import Pdf

pdf = Pdf()
pdf.merge(['c:/temp/invoice_1.pdf', 'c:/temp/invoice_2.pdf'], 'c:/temp/merged_invoices.pdf')

3. Adding Text and Barcode Stamps

Great for adding tracking codes or "CONFIDENTIAL" watermarks to documents before they are printed or archived.

from rpa_pdf import Stamp

stamp = Stamp()

# Add a text watermark to the center of the first page
stamp.add_text_stamp(
    input_pdf_file_path='c:/temp/input.pdf',
    output_pdf_file_path='c:/temp/watermarked.pdf',
    text='CONFIDENTIAL',
    font_size=40,
    text_horizontal_position='center',
    text_vertical_position='center'
)

# Add a Code39 tracking barcode
stamp.add_code39_stamp(
    input_pdf_file_path='c:/temp/input.pdf', 
    output_pdf_file_path='c:/temp/barcoded.pdf', 
    code='12345678',
    vertical_position='top',
    horizontal_position='right'
)

4. Splitting & Rotating PDFs

Extract individual pages from a large batch document or fix scanned orientations.

from rpa_pdf import PdfSplitter, PdfRotator

# Split all pages into individual files
splitter = PdfSplitter()
splitter.split('c:/temp/batch_invoices.pdf', 'c:/temp/output_folder')

# Rotate the first page 90 degrees
rotator = PdfRotator()
rotator.rotate('c:/temp/scanned_doc.pdf', angle=90, pages='first')

5. Content Extraction

Pull text and embedded images out of a PDF.

from rpa_pdf import TextExtractor, ImageExtractor

text_extractor = TextExtractor()
text = text_extractor.extract_text('c:/temp/document.pdf', pages=[0, 1])

img_extractor = ImageExtractor()
img_extractor.extract_images('c:/temp/document.pdf', 'c:/temp/images')

6. Generate PDF from Raw Text

Directly write string contents to a brand new PDF document.

from rpa_pdf import Pdf

pdf = Pdf()
pdf.text_to_pdf(text="Automated report execution successful.", output_file_path="c:/temp/report.pdf")

7. Compressing a PDF File

Reduce the file size of a PDF before sending it via email.

from rpa_pdf import Pdf

pdf = Pdf()
pdf.compress('c:/temp/large_file.pdf')

8. Print a PDF Silently

Print directly to a named printer using the bundled SumatraPDF engine.

from rpa_pdf import Pdf

pdf = Pdf()
pdf.print('c:/temp/document.pdf', 'printer_name')

Development & Testing

This project uses uv for dependency management and ruff for linting. Tests are run via pytest.

# Install dependencies including development extras
uv sync --all-extras

# Run tests
pytest

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.0.3

Jun 8, 2026

1.1.16

Apr 15, 2025

1.1.15

Jan 25, 2024

1.1.14

Jan 25, 2024

1.1.13

Aug 14, 2023

1.1.12

Aug 14, 2023

1.1.11

Aug 14, 2023

1.1.10

Aug 8, 2023

1.1.9

Aug 3, 2023

1.1.8

Jul 26, 2023

1.1.7

Jul 26, 2023

1.1.6

Jul 25, 2023

1.1.5

Jul 25, 2023

1.1.4

Jul 25, 2023

1.1.3

Jul 24, 2023

1.1.2

Jul 24, 2023

1.1.1

Jul 24, 2023

1.1.0

Jul 19, 2023

1.0.2

Nov 16, 2022

1.0.1 yanked

Oct 19, 2022

Reason this release was yanked:

buggy print function

1.0.0 yanked

Oct 19, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rpa_pdf-2.0.3.tar.gz (17.2 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rpa_pdf-2.0.3-py3-none-any.whl (19.9 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file rpa_pdf-2.0.3.tar.gz.

File metadata

Download URL: rpa_pdf-2.0.3.tar.gz
Upload date: Jun 8, 2026
Size: 17.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.1

File hashes

Hashes for rpa_pdf-2.0.3.tar.gz
Algorithm	Hash digest
SHA256	`7714590f748ea05331f414fba5112d4de17097377ef2f802fd056550a39d82f4`
MD5	`c0f0891ae3791fe057e1d29d97d2884e`
BLAKE2b-256	`b915480e1beaae464df55274fcb320d6a551fc54912b914bbb882d3bfe048213`

See more details on using hashes here.

File details

Details for the file rpa_pdf-2.0.3-py3-none-any.whl.

File metadata

Download URL: rpa_pdf-2.0.3-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.1

File hashes

Hashes for rpa_pdf-2.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3dd4ab6edd91094e468abf67dcbcf01fb6176fef656359af1488082af93a6396`
MD5	`af83c4631d23f262e3d3cb6aebbcc3ea`
BLAKE2b-256	`707e95dae9748b1099accfb78ed9e60bbbf332be663da39ed93296aa541c9454`

See more details on using hashes here.

rpa-pdf 2.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PDF documents operations library

Business Capabilities

Architecture & Structure

Installation

Examples

1. Universal File Conversions (Word/Excel/PowerPoint/HTML/Email/Images)

2. Merging PDF Files

3. Adding Text and Barcode Stamps

4. Splitting & Rotating PDFs

5. Content Extraction

6. Generate PDF from Raw Text

7. Compressing a PDF File

8. Print a PDF Silently

Development & Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes