Enterprise PDF SDK — render, extract, annotate, sign, and validate PDFs. Pure Rust, zero system dependencies.
Project description
pdfluent
Enterprise PDF SDK for Python — built on a pure-Rust stack, zero system dependencies.
Render pages, extract text, fill forms, annotate, redact, encrypt, merge, and validate PDF/A — all from a single pip install.
Installation
pip install pdfluent
# Optional extras
pip install pdfluent[pillow] # PIL Image support
pip install pdfluent[numpy] # NumPy array support
Requires Python ≥ 3.8. Pre-built wheels for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64).
Quick Start
from pdfluent import Document
# Open, inspect, render
with Document("invoice.pdf") as doc:
print(f"{doc.page_count} pages — {doc.metadata.title}")
img = doc[0].render(dpi=150)
img.save("page_0.png") # requires Pillow
# Extract text
doc = Document("report.pdf")
for page in doc:
print(page.extract_text())
# Fill a form field and save
doc = Document("form.pdf")
doc.set_form_field("Name", "Jane Doe")
doc.save("form_filled.pdf")
# Search-and-redact
doc = Document("contract.pdf")
report = doc.redact_text("Confidential")
print(f"Redacted {report.areas_redacted} areas on {report.pages_affected} pages")
doc.save("contract_redacted.pdf")
# PDF/A validation
from pdfluent import validate_pdfa
report = validate_pdfa("archive.pdf")
if report.is_compliant:
print(f"✓ {report.pdfa_level} compliant")
else:
for issue in report.issues:
print(f"[{issue.severity}] {issue.rule}: {issue.message}")
# Merge PDFs
from pdfluent import merge_pdfs
merge_pdfs(["a.pdf", "b.pdf", "c.pdf"], "merged.pdf")
# Encrypt / decrypt
doc = Document("sensitive.pdf")
doc.encrypt("sensitive_enc.pdf", password="s3cr3t")
from pdfluent import decrypt_pdf
decrypt_pdf("sensitive_enc.pdf", "sensitive_dec.pdf", password="s3cr3t")
Features
| Feature | Description |
|---|---|
| Render | Pages to RGBA pixels, PIL Images, or NumPy arrays at any DPI |
| Text extraction | Plain text or structured TextBlock/TextSpan with position |
| Text search | Find pages containing a query string |
| Forms (AcroForm) | Read and fill text, checkbox, and dropdown fields |
| Annotations | Read existing annotations; add highlights and free-text notes |
| Redaction | Search-and-redact: black-box all occurrences of a string |
| Encryption | AES-256 (PDF 2.0) encrypt/decrypt with user + owner passwords |
| Merge / split | Merge multiple PDFs; split into individual pages (via page slicing) |
| PDF/A validation | Validate against PDF/A-1B, 2B, 3B with issue-level reporting |
| Metadata | Read title, author, subject, keywords, creator, producer |
| Bookmarks | Traverse the document outline tree |
| Thumbnails | Fast downscaled preview images |
API Overview
Document(source, password=None)
Opens a PDF from a file path (str) or raw bytes.
doc = Document("file.pdf") # from path
doc = Document(open("file.pdf","rb").read()) # from bytes
doc = Document("encrypted.pdf", password="pw")
Properties: page_count, metadata, bookmarks
Methods: render_all(dpi), search(query), extract_text(page_num), save(path),
get_form_fields(), set_form_field(name, value), get_annotations(page),
add_annotation(page, type, rect, content), redact_text(term, page=None),
encrypt(path, password), decrypt(path, password)
Protocols: len(doc), doc[0], for page in doc, with Document(...) as doc
Page
Properties: index, width, height, rotation, geometry
Methods: render(dpi, width, height, background), thumbnail(max_dimension),
extract_text(), extract_text_blocks()
RenderedImage
Properties: width, height, pixels (raw RGBA bytes)
Methods: to_pil(), to_numpy(), save(path)
Module-level functions
| Function | Description |
|---|---|
open_pdf(path, password=None) |
Alias for Document(path) |
merge_pdfs(paths, output) |
Merge a list of PDFs |
validate_pdfa(path) → ComplianceReport |
Run PDF/A validation |
decrypt_pdf(input, output, password) |
Decrypt to a new file |
Comparison
| pdfluent | pypdf | pdfminer | pdfplumber | pikepdf | |
|---|---|---|---|---|---|
| Rendering | ✓ | – | – | ✓ (via pdfminer) | – |
| Text extraction | ✓ | ✓ | ✓ | ✓ | – |
| Form fill | ✓ | ✓ | – | – | ✓ |
| Redaction | ✓ | – | – | – | ✓ |
| Encryption | ✓ (AES-256) | ✓ | – | – | ✓ |
| PDF/A validation | ✓ | – | – | – | – |
| Native deps | none | none | none | none | libqpdf |
| Language | Rust | Python | Python | Python | C++ |
License Activation
The SDK runs in Trial mode by default; output is marked via /Producer
metadata. Activate a license to unlock the paid-tier capability set.
import pdfluent
# Activate from a key string
pdfluent.activate_license_key("tier:enterprise")
# Or read the key from a UTF-8 text file
pdfluent.activate_license_file("/path/to/key.lic")
# Inspect the current status (always succeeds; defaults to Trial)
status = pdfluent.license_status()
print(status.tier) # "Enterprise"
print(status.source) # "Explicit" | "EnvVar" | "Default"
print(status.output_is_marked) # False
The PDFLUENT_LICENSE_KEY environment variable is honoured automatically
on process start when no explicit activation has happened.
Behavior to be aware of:
- The active tier is process-global and set-once. Re-activating with the
same key is a no-op. Re-activating with a different tier raises
RuntimeError; restart Python to switch tiers. - Invalid keys raise
ValueError; missing license files raiseOSError. - The key string is never logged or stored beyond the call to
activate_license_key.
The 1.0 release accepts the simple evaluation format tier:<name>
(trial/developer/team/business/enterprise). Cryptographically
signed payloads will be accepted by the same functions in 1.1 without
breaking the API.
Building from Source
Requires a Rust toolchain and maturin.
pip install maturin
cd crates/pdf-python
maturin develop --release # install in current venv
maturin build --release # build wheel in ./dist/
License
PDFluent Commercial License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfluent-1.0.0b7-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: pdfluent-1.0.0b7-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 5.7 MB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fc713ac2dd1cb8536fc63157b6c8de5f21703fd9988e1eb7b3f1e45101d7c69
|
|
| MD5 |
d4cf139991e74951e32d6150ccfa6599
|
|
| BLAKE2b-256 |
7db78d6a0c23634b397e489f3f42f746121edd8c0db8b170bbf764a5165cf313
|