Enterprise PDF SDK — render, extract, annotate, sign, and validate PDFs. Pure Rust, zero system dependencies.
Project description
pdfluent
Enterprise PDF SDK for Python — built on a pure-Rust stack, zero system dependencies.
Render pages, extract text, fill forms, annotate, redact, encrypt, merge, and validate PDF/A — all from a single pip install.
Installation
pip install pdfluent
# Optional extras
pip install pdfluent[pillow] # PIL Image support
pip install pdfluent[numpy] # NumPy array support
Requires Python ≥ 3.8. Pre-built wheels for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64).
Quick Start
from pdfluent import Document
# Open, inspect, render
with Document("invoice.pdf") as doc:
print(f"{doc.page_count} pages — {doc.metadata.title}")
img = doc[0].render(dpi=150)
img.save("page_0.png") # requires Pillow
# Extract text
doc = Document("report.pdf")
for page in doc:
print(page.extract_text())
# Fill a form field and save
doc = Document("form.pdf")
doc.set_form_field("Name", "Jane Doe")
doc.save("form_filled.pdf")
# Search-and-redact
doc = Document("contract.pdf")
report = doc.redact_text("Confidential")
print(f"Redacted {report.areas_redacted} areas on {report.pages_affected} pages")
doc.save("contract_redacted.pdf")
# PDF/A validation
from pdfluent import validate_pdfa
report = validate_pdfa("archive.pdf")
if report.is_compliant:
print(f"✓ {report.pdfa_level} compliant")
else:
for issue in report.issues:
print(f"[{issue.severity}] {issue.rule}: {issue.message}")
# Merge PDFs
from pdfluent import merge_pdfs
merge_pdfs(["a.pdf", "b.pdf", "c.pdf"], "merged.pdf")
# Encrypt / decrypt
doc = Document("sensitive.pdf")
doc.encrypt("sensitive_enc.pdf", password="s3cr3t")
from pdfluent import decrypt_pdf
decrypt_pdf("sensitive_enc.pdf", "sensitive_dec.pdf", password="s3cr3t")
Features
| Feature | Description |
|---|---|
| Render | Pages to RGBA pixels, PIL Images, or NumPy arrays at any DPI |
| Text extraction | Plain text or structured TextBlock/TextSpan with position |
| Text search | Find pages containing a query string |
| Forms (AcroForm) | Read and fill text, checkbox, and dropdown fields |
| Annotations | Read existing annotations; add highlights and free-text notes |
| Redaction | Search-and-redact: black-box all occurrences of a string |
| Encryption | AES-256 (PDF 2.0) encrypt/decrypt with user + owner passwords |
| Merge / split | Merge multiple PDFs; split into individual pages (via page slicing) |
| PDF/A validation | Validate against PDF/A-1B, 2B, 3B with issue-level reporting |
| Metadata | Read title, author, subject, keywords, creator, producer |
| Bookmarks | Traverse the document outline tree |
| Thumbnails | Fast downscaled preview images |
API Overview
Document(source, password=None)
Opens a PDF from a file path (str) or raw bytes.
doc = Document("file.pdf") # from path
doc = Document(open("file.pdf","rb").read()) # from bytes
doc = Document("encrypted.pdf", password="pw")
Properties: page_count, metadata, bookmarks
Methods: render_all(dpi), search(query), extract_text(page_num), save(path),
get_form_fields(), set_form_field(name, value), get_annotations(page),
add_annotation(page, type, rect, content), redact_text(term, page=None),
encrypt(path, password), decrypt(path, password)
Protocols: len(doc), doc[0], for page in doc, with Document(...) as doc
Page
Properties: index, width, height, rotation, geometry
Methods: render(dpi, width, height, background), thumbnail(max_dimension),
extract_text(), extract_text_blocks()
RenderedImage
Properties: width, height, pixels (raw RGBA bytes)
Methods: to_pil(), to_numpy(), save(path)
TextSpan
Structured text with position data.
Properties: text, x, y, font_size
G1 font-metadata (Optional): font_name, is_bold, is_italic, color
G1 fields return
Nonein the current release. They are typed asOptionalso downstream code handles theNonecase correctly today and will automatically receive data once the G1 extraction milestone lands.
for block in page.extract_text_blocks():
for span in block.spans:
if span.font_name is not None:
print(f"{span.font_name} {'bold' if span.is_bold else ''}")
print(f" '{span.text}' @ ({span.x:.1f}, {span.y:.1f})")
Module-level functions
| Function | Description |
|---|---|
open_pdf(path, password=None) |
Alias for Document(path) |
merge_pdfs(paths, output) |
Merge a list of PDFs |
validate_pdfa(path) → ComplianceReport |
Run PDF/A validation |
decrypt_pdf(input, output, password) |
Decrypt to a new file |
Exception Hierarchy
Every pdfluent-specific error derives from PdfluentError, so a single
except PdfluentError: clause catches all library errors:
from pdfluent import PdfluentError, PdfluentParseError, PdfluentEncryptedError
try:
with Document("broken.pdf") as doc:
doc.render_all()
except PdfluentParseError as exc:
print(f"Not a valid PDF: {exc}")
except PdfluentEncryptedError:
print("PDF is password-protected")
except PdfluentError as exc:
print(f"PDF error: {exc}")
Full hierarchy:
PdfluentError — base; catch all pdfluent errors
├── PdfluentParseError — corrupt / non-PDF bytes
├── PdfluentValidationError — schema / compliance failures
├── PdfluentRenderError — rendering and XFA flatten failures
├── PdfluentEncryptedError — operation blocked by encryption
├── PdfluentPageRangeError — page index out of range
├── PdfluentIoError — file-system I/O errors
├── PdfluentLicenseError — invalid / expired license
├── PdfluentGeometryError — invalid page geometry
└── PdfluentLimitError — processing-limit exceeded
Typing Support
pdfluent ships with hand-written .pyi stub files for IDE completion and
mypy --strict compatibility:
pdfluent/__init__.pyi— full public API stubspdfluent/_native.pyi— native extension stubs (for mypy without a build)
Verifying with mypy
pip install mypy
cd crates/pdf-python
mypy --strict --python-path python tests/test_pdfluent_typing.py
Example with typed annotations
from __future__ import annotations
from typing import Optional
from pdfluent import Document, TextSpan, PdfluentError
def get_font(span: TextSpan) -> Optional[str]:
"""Return the font name if available."""
return span.font_name # Optional[str] — mypy knows this may be None
def safe_open(path: str) -> Optional[Document]:
try:
return Document(path)
except PdfluentError:
return None
License Activation
from pdfluent import activate_license, LicenseInfo, PdfluentLicenseError
# Activate from a JSON license string or base64-encoded key
try:
info: LicenseInfo = activate_license(open("my.license").read())
print(f"{info.tier} license for {info.company} ({info.seats} seats)")
except PdfluentLicenseError as exc:
print(f"License error: {exc}")
# Or set the environment variable and call with empty string:
# PDFLUENT_LICENSE_KEY="<base64-key>" python myscript.py
info = activate_license("") # reads PDFLUENT_LICENSE_KEY from env
LicenseInfo fields: licensee, company, tier, expires_at (Unix timestamp), seats.
Comparison
| pdfluent | pypdf | pdfminer | pdfplumber | pikepdf | |
|---|---|---|---|---|---|
| Rendering | ✓ | – | – | ✓ (via pdfminer) | – |
| Text extraction | ✓ | ✓ | ✓ | ✓ | – |
| Form fill | ✓ | ✓ | – | – | ✓ |
| Redaction | ✓ | – | – | – | ✓ |
| Encryption | ✓ (AES-256) | ✓ | – | – | ✓ |
| PDF/A validation | ✓ | – | – | – | – |
| Typed stubs | ✓ | partial | – | – | – |
| Native deps | none | none | none | none | libqpdf |
| Language | Rust | Python | Python | Python | C++ |
License Activation
The SDK runs in Trial mode by default; output is marked via /Producer
metadata. Activate a license to unlock the paid-tier capability set.
import pdfluent
# Activate from a key string
pdfluent.activate_license_key("tier:enterprise")
# Or read the key from a UTF-8 text file
pdfluent.activate_license_file("/path/to/key.lic")
# Inspect the current status (always succeeds; defaults to Trial)
status = pdfluent.license_status()
print(status.tier) # "Enterprise"
print(status.source) # "Explicit" | "EnvVar" | "Default"
print(status.output_is_marked) # False
The PDFLUENT_LICENSE_KEY environment variable is honoured automatically
on process start when no explicit activation has happened.
Behavior to be aware of:
- The active tier is process-global and set-once. Re-activating with the
same key is a no-op. Re-activating with a different tier raises
RuntimeError; restart Python to switch tiers. - Invalid keys raise
ValueError; missing license files raiseOSError. - The key string is never logged or stored beyond the call to
activate_license_key.
The 1.0 release accepts the simple evaluation format tier:<name>
(trial/developer/team/business/enterprise). Cryptographically
signed payloads will be accepted by the same functions in 1.1 without
breaking the API.
Building from Source
Requires a Rust toolchain and maturin.
pip install maturin
cd crates/pdf-python
maturin develop --release # install in current venv
maturin build --release # build wheel in ./dist/
License
PDFluent Commercial License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfluent-1.0.0b8-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: pdfluent-1.0.0b8-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.5 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1bb030e208344f922cce095ea9418b0de8ac559420bb22dcba383045c1aee6e
|
|
| MD5 |
f354ec3914d025b3fa5db18c05b6f9df
|
|
| BLAKE2b-256 |
f86753d624257cb6bc08747ca2ec0c3e87240f4fcda827da436009d8c29c6ec4
|