Skip to main content

Enterprise PDF SDK — render, extract, annotate, sign, and validate PDFs. Pure Rust, zero system dependencies.

Project description

pdfluent

Enterprise PDF SDK for Python — built on a pure-Rust stack, zero system dependencies.

Render pages, extract text, fill forms, annotate, redact, encrypt, merge, and validate PDF/A — all from a single pip install.

Installation

pip install pdfluent

# Optional extras
pip install pdfluent[pillow]   # PIL Image support
pip install pdfluent[numpy]    # NumPy array support

Requires Python ≥ 3.8. Pre-built wheels for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64).

Quick Start

from pdfluent import Document

# Open, inspect, render
with Document("invoice.pdf") as doc:
    print(f"{doc.page_count} pages — {doc.metadata.title}")

    img = doc[0].render(dpi=150)
    img.save("page_0.png")          # requires Pillow

# Extract text
doc = Document("report.pdf")
for page in doc:
    print(page.extract_text())

# Fill a form field and save
doc = Document("form.pdf")
doc.set_form_field("Name", "Jane Doe")
doc.save("form_filled.pdf")

# Search-and-redact
doc = Document("contract.pdf")
report = doc.redact_text("Confidential")
print(f"Redacted {report.areas_redacted} areas on {report.pages_affected} pages")
doc.save("contract_redacted.pdf")

# PDF/A validation
from pdfluent import validate_pdfa

report = validate_pdfa("archive.pdf")
if report.is_compliant:
    print(f"✓ {report.pdfa_level} compliant")
else:
    for issue in report.issues:
        print(f"[{issue.severity}] {issue.rule}: {issue.message}")

# Merge PDFs
from pdfluent import merge_pdfs
merge_pdfs(["a.pdf", "b.pdf", "c.pdf"], "merged.pdf")

# Encrypt / decrypt
doc = Document("sensitive.pdf")
doc.encrypt("sensitive_enc.pdf", password="s3cr3t")

from pdfluent import decrypt_pdf
decrypt_pdf("sensitive_enc.pdf", "sensitive_dec.pdf", password="s3cr3t")

Features

Feature Description
Render Pages to RGBA pixels, PIL Images, or NumPy arrays at any DPI
Text extraction Plain text or structured TextBlock/TextSpan with position
Text search Find pages containing a query string
Forms (AcroForm) Read and fill text, checkbox, and dropdown fields
Annotations Read existing annotations; add highlights and free-text notes
Redaction Search-and-redact: black-box all occurrences of a string
Encryption AES-256 (PDF 2.0) encrypt/decrypt with user + owner passwords
Merge / split Merge multiple PDFs; split into individual pages (via page slicing)
PDF/A validation Validate against PDF/A-1B, 2B, 3B with issue-level reporting
Metadata Read title, author, subject, keywords, creator, producer
Bookmarks Traverse the document outline tree
Thumbnails Fast downscaled preview images

API Overview

Document(source, password=None)

Opens a PDF from a file path (str) or raw bytes.

doc = Document("file.pdf")             # from path
doc = Document(open("file.pdf","rb").read())  # from bytes
doc = Document("encrypted.pdf", password="pw")

Properties: page_count, metadata, bookmarks Methods: render_all(dpi), search(query), extract_text(page_num), save(path), get_form_fields(), set_form_field(name, value), get_annotations(page), add_annotation(page, type, rect, content), redact_text(term, page=None), encrypt(path, password), decrypt(path, password) Protocols: len(doc), doc[0], for page in doc, with Document(...) as doc

Page

Properties: index, width, height, rotation, geometry Methods: render(dpi, width, height, background), thumbnail(max_dimension), extract_text(), extract_text_blocks()

RenderedImage

Properties: width, height, pixels (raw RGBA bytes) Methods: to_pil(), to_numpy(), save(path)

Module-level functions

Function Description
open_pdf(path, password=None) Alias for Document(path)
merge_pdfs(paths, output) Merge a list of PDFs
validate_pdfa(path)ComplianceReport Run PDF/A validation
decrypt_pdf(input, output, password) Decrypt to a new file

Comparison

pdfluent pypdf pdfminer pdfplumber pikepdf
Rendering ✓ (via pdfminer)
Text extraction
Form fill
Redaction
Encryption ✓ (AES-256)
PDF/A validation
Native deps none none none none libqpdf
Language Rust Python Python Python C++

License Activation

The SDK runs in Trial mode by default; output is marked via /Producer metadata. Activate a license to unlock the paid-tier capability set.

import pdfluent

# Activate from a key string
pdfluent.activate_license_key("tier:enterprise")

# Or read the key from a UTF-8 text file
pdfluent.activate_license_file("/path/to/key.lic")

# Inspect the current status (always succeeds; defaults to Trial)
status = pdfluent.license_status()
print(status.tier)              # "Enterprise"
print(status.source)            # "Explicit" | "EnvVar" | "Default"
print(status.output_is_marked)  # False

The PDFLUENT_LICENSE_KEY environment variable is honoured automatically on process start when no explicit activation has happened.

Behavior to be aware of:

  • The active tier is process-global and set-once. Re-activating with the same key is a no-op. Re-activating with a different tier raises RuntimeError; restart Python to switch tiers.
  • Invalid keys raise ValueError; missing license files raise OSError.
  • The key string is never logged or stored beyond the call to activate_license_key.

The 1.0 release accepts the simple evaluation format tier:<name> (trial/developer/team/business/enterprise). Cryptographically signed payloads will be accepted by the same functions in 1.1 without breaking the API.

Building from Source

Requires a Rust toolchain and maturin.

pip install maturin
cd crates/pdf-python
maturin develop --release          # install in current venv
maturin build --release            # build wheel in ./dist/

License

PDFluent Commercial License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfluent-1.0.0b7-cp311-cp311-macosx_10_12_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file pdfluent-1.0.0b7-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pdfluent-1.0.0b7-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0fc713ac2dd1cb8536fc63157b6c8de5f21703fd9988e1eb7b3f1e45101d7c69
MD5 d4cf139991e74951e32d6150ccfa6599
BLAKE2b-256 7db78d6a0c23634b397e489f3f42f746121edd8c0db8b170bbf764a5165cf313

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page