Python wrapper for Exis.PdfEditor — comprehensive PDF toolkit (find/replace, merge, split, forms, redaction, watermark, encryption, signatures, PDF/A, and more)
Project description
Exis.PdfEditor for Python
Comprehensive PDF toolkit for Python — find/replace, merge, split, form filling, redaction, image editing, watermark, Bates stamping, page editing, encryption, optimization, digital signatures, PDF/A compliance, XMP + /Info metadata, OCR for scanned PDFs, and more.
Powered by the Exis.PdfEditor .NET library, compiled to a native binary via .NET Native AOT. No .NET runtime required.
Installation
pip install exis-pdfeditor
Platform-specific wheels are available for:
- Windows x64 (
win_amd64) - Linux x64 (
manylinux_2_17_x86_64) - macOS ARM64 / Apple Silicon (
macosx_11_0_arm64)
Quick start
import exis_pdfeditor
# Optional: set a license key (or use EXIS_PDF_LICENSE_KEY env var).
# Without a key, a 14-day trial is activated automatically.
exis_pdfeditor.initialize("XXXX-XXXX-XXXX-XXXX")
# Inspect a PDF (free — no license required)
info = exis_pdfeditor.inspect("document.pdf")
print(f"{info.pageCount} pages, title: {info.title}")
# Find and replace text
result = exis_pdfeditor.find_replace(
"input.pdf", "output.pdf",
"old text", "new text",
case_sensitive=False,
)
print(f"{result.totalReplacements} replacements made")
Features
| Feature | Function |
|---|---|
| Inspect | inspect(path) — metadata, fonts, pages, encryption status |
| Text extraction | extract_text(path), extract_text_structured(path) |
| Find & replace | find_replace(...) — regex, case-insensitive, whole-word, styling |
| Merge | merge(paths, output) |
| Split | split(path, output_dir) |
| Extract pages | extract_pages(path, output, pages=[1,3,5]) |
| Form fields | list_fields(path, split_duplicate_widgets=...), fill_form(..., text_alignment=..., split_duplicate_widgets=..., flatten=...) |
| Redaction | redact(path, output, redactions) |
| Watermark | watermark(path, output, "DRAFT", position="across") |
| Stamp | stamp(path, output, stamp_pdf, mode="overlay") |
| Bates stamping | bates_stamp(path, output, prefix="ABC", digits=6) |
| Optimize | optimize(path, output, downsample_images=True) |
| Encrypt / Decrypt | encrypt(...), decrypt(...) |
| Page editing | rotate(...), crop(...), reorder(...), delete_pages(...), insert_blank_pages(...) |
| Images | find_images(path), replace_image(...) |
| Digital signatures | sign(...), verify(...) |
| PDF/A | pdfa_validate(path), pdfa_convert(path, output) |
Metadata (XMP + /Info) |
get_metadata(path), set_xmp(...), set_info(...), remove_xmp(...), remove_info(...), remove_metadata(...) |
| Page classification | analyze_pages(path) — pre-OCR triage (free, no license) |
| OCR — searchable PDF (Windows) | make_searchable_pdf(path, output, languages=..., progress=...) |
| OCR — de-identify scans (Windows) | redact_scanned_pdf(path, output, terms=..., visible_replacement=...) |
Platform support
| Function group | Windows | Linux | macOS |
|---|---|---|---|
| All non-OCR functions (find/replace, merge, forms, redaction, watermark, optimization, signatures, PDF/A, metadata, …) | yes | yes | yes |
analyze_pages — page classification |
yes | yes | yes |
make_searchable_pdf, redact_scanned_pdf — OCR |
yes | not yet — raises OcrNotSupportedError |
not yet — raises OcrNotSupportedError |
OCR is currently Windows-only because the page rasterizer (Exis.PdfOcr.Windows) depends on the WinRT Windows.Data.Pdf API and System.Drawing.Common. When a cross-platform rasterizer ships, the Python side will pick it up automatically.
Inspect
Retrieve metadata, page info, fonts, encryption status, and form field counts. No license required.
info = exis_pdfeditor.inspect("document.pdf")
print(f"Version: {info.version}")
print(f"Pages: {info.pageCount}")
print(f"Title: {info.title}")
print(f"Author: {info.author}")
print(f"Encrypted: {info.isEncrypted}")
print(f"Has forms: {info.hasFormFields} ({info.formFieldCount} fields)")
print(f"Fonts: {info.fontsUsed}")
for page in info.pages:
print(f" Page {page.pageNumber}: {page.widthInPoints}x{page.heightInPoints} pt, "
f"{page.characterCount} chars")
Text extraction
Plain text
result = exis_pdfeditor.extract_text("document.pdf")
print(result.fullText)
for page in result.pages:
print(f"Page {page.pageNumber}: {page.text[:100]}...")
Specific pages only
result = exis_pdfeditor.extract_text("document.pdf", pages=[1, 3, 5])
Structured text (with positions and font data)
result = exis_pdfeditor.extract_text_structured("document.pdf")
for page in result.pages:
for block in page.textBlocks:
print(f" [{block.x:.0f}, {block.y:.0f}] {block.fontName} {block.fontSize}pt: "
f"{block.text}")
Find & replace
Single pair
result = exis_pdfeditor.find_replace(
"input.pdf", "output.pdf",
"old text", "new text",
)
print(f"{result.totalReplacements} replacements across {result.pagesModified} pages")
Multiple pairs
result = exis_pdfeditor.find_replace(
"input.pdf", "output.pdf",
pairs=[
{"search": "foo", "replace": "bar"},
{"search": "hello", "replace": "world"},
],
)
Regex
result = exis_pdfeditor.find_replace(
"input.pdf", "output.pdf",
pairs=[
{"search": r"\d{3}-\d{2}-\d{4}", "replace": "XXX-XX-XXXX", "isRegex": True},
],
)
All options
result = exis_pdfeditor.find_replace(
"input.pdf", "output.pdf",
"confidential", "[REDACTED]",
case_sensitive=False,
whole_word=True,
use_regex=False,
page_range=[1, 2, 3],
text_fitting="adaptive", # none, preserve_width, fit_to_page, adaptive
min_horizontal_scale=70,
max_font_size_reduction=1.5,
replacement_text_color={"r": 1, "g": 0, "b": 0},
replacement_highlight_color={"r": 1, "g": 1, "b": 0},
replacement_bold=True,
replacement_underline=False,
replacement_strikethrough=False,
preserve_form_fields=True,
use_incremental_update=True,
)
for detail in result.details:
print(f" Page {detail.pageNumber}: '{detail.originalText}' -> '{detail.replacementText}'")
Merge
exis_pdfeditor.merge(["part1.pdf", "part2.pdf", "part3.pdf"], "merged.pdf")
Split
Split a PDF into one file per page:
result = exis_pdfeditor.split("document.pdf", "output_folder/")
print(f"Split into {result.pageCount} files")
for path in result.files:
print(f" {path}")
Extract pages
Extract specific pages into a new PDF:
exis_pdfeditor.extract_pages("document.pdf", "pages_1_and_3.pdf", pages=[1, 3])
Form fields
Each field has a PDF name, optional displayName (smart label from nearby page text), type, value, options (choice fields), isReadOnly, and hasDuplicateWidgets (true when one logical field has multiple widgets at different positions — e.g. two-up receipts). Pass split_duplicate_widgets=True to list_fields / fill_form to use per-widget keys name_1, name_2, … (page-reading order).
List all fields
fields = exis_pdfeditor.list_fields("form.pdf")
for field in fields:
print(f" {field.name} ({field.type}): {field.value}")
if field.displayName:
print(f" Label: {field.displayName}")
if field.options:
print(f" Options: {field.options}")
if field.hasDuplicateWidgets:
print(" (duplicate widgets — use split_duplicate_widgets=True for per-widget names)")
List with duplicate-widget split (two-up / carbon copy)
fields = exis_pdfeditor.list_fields("two-up-receipt.pdf", split_duplicate_widgets=True)
# e.g. Address_1, Address_2, date_1, date_2, ...
Fill a form
result = exis_pdfeditor.fill_form(
"form.pdf", "filled.pdf",
fields={
"FirstName": "John",
"LastName": "Doe",
"Email": "john@example.com",
"AgreeToTerms": "Yes",
},
)
print(f"Filled {result.fieldsFilled} fields, {result.fieldsNotFound} not found")
Force text alignment (default honors each field's PDF /Q)
result = exis_pdfeditor.fill_form(
"form.pdf", "filled_centered.pdf",
fields={"Comments": "Centered text"},
text_alignment="center", # "auto" | "left" | "center" | "right"
)
Fill duplicate-widget fields (different value per widget; auto-flattens)
result = exis_pdfeditor.fill_form(
"two-up-receipt.pdf", "filled.pdf",
fields={
"Address_1": "Mr. David G Cruz",
"date_1": "10-Jan-67",
"Address_2": "Ms. Claudia Morales",
"date_2": "22-Feb-26",
},
split_duplicate_widgets=True,
)
Fill and flatten (make fields non-editable)
result = exis_pdfeditor.fill_form(
"form.pdf", "filled_flat.pdf",
fields={"Name": "Jane Doe"},
flatten=True,
)
Redaction
Redact text
result = exis_pdfeditor.redact(
"document.pdf", "redacted.pdf",
redactions=[
{"text": "John Doe", "replaceWith": "[NAME REDACTED]"},
{"text": r"\d{3}-\d{2}-\d{4}", "isRegex": True, "replaceWith": "[SSN]"},
],
)
print(f"{result.redactionsApplied} redactions applied")
Redact a specific area on a page
result = exis_pdfeditor.redact(
"document.pdf", "redacted.pdf",
redactions=[
{
"area": {"x": 100, "y": 700, "width": 200, "height": 20},
"pageNumber": 1,
},
],
)
Combined text + area redaction
result = exis_pdfeditor.redact(
"document.pdf", "redacted.pdf",
redactions=[
{"text": "confidential", "caseSensitive": False},
{"area": {"x": 50, "y": 50, "width": 500, "height": 30}, "pageNumber": 2},
],
)
Page classification (pre-OCR triage)
analyze_pages classifies every page as Digital, Scanned, AlreadyOcrd, or Empty purely from the content stream — no rendering and no OCR. Use it to predict which pages will need OCR before paying for a run. Free, no license required, available on every wheel.
pages = exis_pdfeditor.analyze_pages("mixed.pdf")
for p in pages:
print(f" page {p.pageNumber}: {p.kind} "
f"(text {p.textCoverageRatio:.0%}, images {p.imageCoverageRatio:.0%})")
scanned = [p.pageNumber for p in pages if p.kind == "Scanned"]
if scanned:
print(f"{len(scanned)} pages need OCR: {scanned}")
Each entry has pageNumber, kind, textCharCount, textCoverageRatio, imageCoverageRatio, and hasInvisibleTextLayer (signature of an already-OCR'd page).
Searchable PDFs (OCR) — Windows only
make_searchable_pdf adds an invisible, selectable text layer over scanned pages so find/replace, extraction, and redaction work on the result like any digital PDF. Born-digital and already-OCR'd pages are passed through untouched — only true scans are rasterized and recognized, which is fast on mixed documents.
result = exis_pdfeditor.make_searchable_pdf(
"scan.pdf", "searchable.pdf",
languages=("eng",), # default
progress=lambda p: print(f" page {p.page}/{p.total}: {p.phase}"),
)
print(f"OCR'd {result.pagesProcessed}, skipped {result.pagesSkipped}, "
f"avg confidence {result.averageConfidence:.0%}")
# Confidence is never swallowed — flag low-confidence pages for human review.
for page in result.pages:
if page.wasOcrd and page.confidence < 0.80:
print(f" page {page.pageNumber} needs review (conf {page.confidence:.0%})")
Additional languages
The wheel ships English (eng.traineddata). To add languages, drop the matching xxx.traineddata files into the wheel's bundled tessdata/ folder, or place them in your own folder and point at it:
result = exis_pdfeditor.make_searchable_pdf(
"scan.pdf", "searchable.pdf",
languages=("eng", "spa", "vie"),
tessdata_path=r"C:\my-app\tessdata",
)
The invisible text layer embeds a Unicode font (DejaVu Sans), so accented and non-Latin names (e.g. Nguyễn, Peña) survive find/replace intact rather than being mangled.
Options
| Parameter | Default | Notes |
|---|---|---|
languages |
("eng",) |
Tesseract language codes |
dpi |
300 |
Rendering DPI for non-image pages (full-page scans use their native resolution) |
apply_deskew |
True |
Pre-process hint (engine-dependent) |
apply_denoise |
False |
Pre-process hint (engine-dependent) |
min_confidence_to_include |
0.0 |
Words below this are still included (dropping them would risk leaking names you need to match) but counted in per-page confidence |
tessdata_path |
bundled | Override the trained-data folder |
progress |
None |
Callback (page, total, phase, confidence) |
De-identify scanned pages — Windows only
redact_scanned_pdf finds terms on scanned pages via OCR and burns opaque boxes into the actual page image. Unlike OCR-layer redaction (which only changes selectable text), this destroys the underlying pixels — the term is gone from the visible scan and unrecoverable, appropriate for sending de-identified scans externally.
r = exis_pdfeditor.redact_scanned_pdf(
"intake_scan.pdf", "intake_redacted.pdf",
terms=["Jane Doe", "555-12-3456"],
visible_replacement="[REDACTED]", # drawn into the raster
fill="white", # or "black"
)
print(f"Redacted {r.occurrencesRedacted} occurrences on {r.pagesAffected} pages")
if r.pagesSkipped:
print(f"WARNING: {r.pagesSkipped} pages could not be safely burned — "
"review them before sending the output.")
for occ in r.occurrences:
print(f" page {occ.pageNumber}: {occ.term!r} (conf {occ.confidence:.0%})")
Scope: this handles image-based pages. Born-digital text pages are NOT modified — use
redact(...)for those.Pages that are image-based but not a single full-page image are reported in
pagesSkippedand left unchanged. A non-zero value means the output is not safe to send as-is.
The output is rewritten via the optimizer to drop the superseded original image objects, so the un-redacted pixels are not recoverable with PDF forensics.
Watermark
Basic diagonal watermark
result = exis_pdfeditor.watermark("input.pdf", "watermarked.pdf", "DRAFT")
print(f"Watermarked {result.pagesWatermarked} of {result.totalPages} pages")
All watermark options
result = exis_pdfeditor.watermark(
"input.pdf", "watermarked.pdf",
"CONFIDENTIAL",
position="across", # top, bottom, center, across
font_size=72,
text_color={"r": 1, "g": 0, "b": 0},
opacity=0.15,
page_range=[1, 2],
)
Stamp (PDF overlay / underlay)
Overlay a letterhead on top
result = exis_pdfeditor.stamp(
"document.pdf", "stamped.pdf",
"letterhead.pdf",
mode="overlay",
)
print(f"Stamped {result.pagesStamped} pages")
Underlay a background behind content
result = exis_pdfeditor.stamp(
"document.pdf", "stamped.pdf",
"background.pdf",
mode="underlay",
opacity=0.5,
page_range=[1],
)
Bates stamping
Sequential page numbering for legal production and discovery workflows. Each page receives
a zero-padded identifier (e.g. ABC000001, ABC000002) rendered in a chosen corner of the
visual page. Placement is relative to the page's /Rotate orientation, so mixed-rotation
documents render consistently. An XMP audit block recording the range, digit width, and
prefix/suffix is written to the document catalog by default.
Defaults (number starts at 1, 6 digits, bottom-right corner)
result = exis_pdfeditor.bates_stamp("input.pdf", "stamped.pdf")
print(f"Stamped pages {result.firstNumber}-{result.lastNumber}")
print(f"Digits used: {result.digitsUsed}")
Prefix, custom position, color, confidentiality label
result = exis_pdfeditor.bates_stamp(
"input.pdf", "stamped.pdf",
prefix="ABC",
start_number=1,
digits=6, # -> "ABC000001"
position="bottom_right", # top_left, top_center, top_right,
# bottom_left, bottom_center, bottom_right
font_size=10,
text_color={"r": 0, "g": 0, "b": 0},
background_color={"r": 1, "g": 1, "b": 1}, # opaque box behind stamp
margin_inches=0.5,
confidentiality_label="CONFIDENTIAL", # stacked above the Bates number
)
Continuous numbering across a batch
Thread lastNumber + 1 into the next call:
next_n = 1
for path in docs:
r = exis_pdfeditor.bates_stamp(
path, path + ".stamped.pdf",
prefix="ABC", start_number=next_n,
)
next_n = r.lastNumber + 1
Skip the cover page (legal convention)
The cover is ABC000001 in the production log even if physically unstamped:
exis_pdfeditor.bates_stamp(
"input.pdf", "stamped.pdf",
prefix="ABC",
skip_first_page=True,
counter_advances_on_skipped_pages=True, # default
)
Stamp only selected pages
exis_pdfeditor.bates_stamp(
"input.pdf", "stamped.pdf",
page_range=[2, 3, 5], # 1-based
)
Signed input
Stamping invalidates signatures — opt in explicitly:
result = exis_pdfeditor.bates_stamp(
"signed.pdf", "stamped.pdf",
allow_signed_input=True, # result.warnings will record it
)
for w in result.warnings:
print(f" warning: {w}")
Suppress the XMP audit block
exis_pdfeditor.bates_stamp(
"input.pdf", "stamped.pdf",
write_xmp_metadata=False,
)
All options at a glance
| Parameter | Default | Purpose |
|---|---|---|
prefix / suffix |
"" |
Text bracketing the number |
start_number |
1 |
First Bates number (for batch continuation) |
digits |
6 |
Minimum zero-padded width; auto-expands if needed |
position |
"bottom_right" |
Corner/edge on the visual page |
font_size |
10.0 |
Point size (Helvetica) |
text_color |
black | RGB dict {"r": 0, "g": 0, "b": 0} |
background_color |
none | Opaque rectangle behind stamp for legibility |
margin_inches |
0.5 |
Distance from trimmed page edge |
confidentiality_label |
none | e.g. "CONFIDENTIAL" |
confidentiality_position |
same corner | Override to place label elsewhere |
confidentiality_font_size |
matches font_size |
Label point size |
page_range |
all pages | 1-based page numbers to stamp |
skip_first_page |
False |
Skip page 1 (cover sheets) |
counter_advances_on_skipped_pages |
True |
Advance counter on skipped pages |
allow_signed_input |
False |
Stamp PDFs with digital signatures |
write_xmp_metadata |
True |
Write an XMP audit block to the catalog |
Result fields: firstNumber, lastNumber, pagesStamped, digitsUsed (equals
digits unless auto-expanded), warnings (non-fatal diagnostics: digit expansion,
signed-input stamped anyway, etc.).
Optimize
Default optimization (compress + deduplicate)
result = exis_pdfeditor.optimize("large.pdf", "smaller.pdf")
print(f"Reduced {result.originalSize:,} -> {result.optimizedSize:,} bytes "
f"({result.reductionPercent:.1f}% smaller)")
print(f" Streams compressed: {result.streamsCompressed}")
print(f" Duplicates removed: {result.duplicatesRemoved}")
With image downsampling
result = exis_pdfeditor.optimize(
"large.pdf", "smaller.pdf",
downsample_images=True,
max_image_dpi=150,
remove_metadata=True,
)
print(f"Images downsampled: {result.imagesDownsampled}")
Encrypt & decrypt
Encrypt with a password
exis_pdfeditor.encrypt(
"document.pdf", "protected.pdf",
user_password="openme",
owner_password="secret",
permissions=["Print", "CopyText"],
)
Available permissions: Print, ModifyContents, CopyText, AddAnnotations,
FillForms, PrintHighQuality, All.
Decrypt
exis_pdfeditor.decrypt("protected.pdf", "unlocked.pdf", password="openme")
Page editing
Rotate pages
result = exis_pdfeditor.rotate("input.pdf", "rotated.pdf", angle=90)
print(f"Rotated {result.pagesModified} pages")
# Rotate only specific pages
result = exis_pdfeditor.rotate("input.pdf", "rotated.pdf", angle=180, pages=[2, 4])
Crop pages
result = exis_pdfeditor.crop(
"input.pdf", "cropped.pdf",
rect={"x": 50, "y": 50, "width": 500, "height": 700},
)
Reorder pages
exis_pdfeditor.reorder("input.pdf", "reordered.pdf", order=[3, 1, 2])
Delete pages
exis_pdfeditor.delete_pages("input.pdf", "trimmed.pdf", pages=[2, 4])
Insert blank pages
Each insertion specifies a 0-based "after page" anchor (0 = before page 1) and an
optional page size. Supported sizes: letter (default), legal, a4, a3, a5,
tabloid.
exis_pdfeditor.insert_blank_pages(
"input.pdf", "with_blanks.pdf",
insertions=[
{"afterPage": 0, "size": "a4"}, # before page 1
{"afterPage": 3, "size": "letter"}, # after page 3
],
)
Images
Find all images
result = exis_pdfeditor.find_images("document.pdf")
print(f"Found {result.totalImages} images across {result.pagesSearched} pages")
for img in result.images:
print(f" Image {img.index}: {img.pixelWidth}x{img.pixelHeight} "
f"{img.colorSpace} {img.format}")
print(f" Pages: {img.pageNumbers}")
Find images and save to disk
result = exis_pdfeditor.find_images("document.pdf", output_dir="extracted_images/")
Replace all images
result = exis_pdfeditor.replace_image(
"document.pdf", "replaced.pdf",
"new_logo.png",
)
print(f"Replaced {result.imagesReplaced} of {result.imagesFound} images")
Replace specific images by index or page
result = exis_pdfeditor.replace_image(
"document.pdf", "replaced.pdf",
"new_logo.jpg",
image_indices=[0, 2],
page_range=[1],
scale_mode="scale_to_fit", # match_original_size, preserve_aspect_ratio, scale_to_fit
)
Digital signatures
Sign a PDF
exis_pdfeditor.sign(
"document.pdf", "signed.pdf",
cert_path="certificate.pfx",
cert_password="certpass",
reason="Approved",
location="New York, NY",
signer_name="John Doe",
)
Sign with a visible signature
exis_pdfeditor.sign(
"document.pdf", "signed.pdf",
cert_path="certificate.pfx",
cert_password="certpass",
visible=True,
page=1,
rect={"x": 50, "y": 50, "width": 200, "height": 60},
reason="Reviewed and approved",
)
Verify signatures
info = exis_pdfeditor.verify("signed.pdf")
print(f"Signed: {info.isSigned}")
print(f"Signer: {info.signerName}")
print(f"Valid: {info.isValid}")
print(f"Reason: {info.reason}")
print(f"Date: {info.signDate}")
# Verify all signatures in a multi-signed document
signatures = exis_pdfeditor.verify("multi_signed.pdf", all_signatures=True)
for sig in signatures:
print(f" {sig.signerName}: valid={sig.isValid}")
PDF/A compliance
Validate
result = exis_pdfeditor.pdfa_validate("document.pdf", level="2b")
print(f"Compliant: {result.isCompliant}")
if not result.isCompliant:
for v in result.violations:
print(f" [{v.code}] {v.message} (auto-fixable: {v.canAutoFix})")
Convert to PDF/A
exis_pdfeditor.pdfa_convert("document.pdf", "archive.pdf", level="2b")
Supported levels: 1b, 2b, 2u, 3b, 3u.
XMP + /Info metadata
PDFs carry document metadata in two places:
- XMP — the modern RDF/XML packet referenced from the Catalog
/Metadataentry. /Info— the legacy trailer dictionary (Title,Author,Subject,Keywords,Creator,Producer,CreationDate,ModDate, plus custom keys).
get_metadata() is free and works on encrypted PDFs (XMP only).
All mutation calls (set_xmp, set_info, remove_xmp, remove_info, remove_metadata)
require a license and reject encrypted input — decrypt first with decrypt().
Read metadata
meta = exis_pdfeditor.get_metadata("document.pdf")
if meta.hasXmp:
print(f"XMP packet: {meta.xmpByteSize} bytes")
print(meta.xmpXml) # Full <?xpacket ... ?> payload as UTF-8 text
if meta.hasInfo:
info = meta.info
print(f"Title: {info.title}")
print(f"Author: {info.author}")
print(f"Subject: {info.subject}")
print(f"Keywords: {info.keywords}")
print(f"Creator: {info.creator}")
print(f"Producer: {info.producer}")
print(f"Created: {info.creationDate}") # ISO 8601 string or None
print(f"Modified: {info.modificationDate}") # same
# Non-standard /Info keys (e.g. "Company", custom producer fields)
for key, value in vars(info.custom).items():
print(f" [custom] {key} = {value}")
Replace the XMP packet
Pass a full <?xpacket ...?>-wrapped RDF/XML document as a string:
xmp = """<?xpacket begin=""?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title><rdf:Alt><rdf:li xml:lang="x-default">Quarterly Report</rdf:li></rdf:Alt></dc:title>
<dc:creator><rdf:Seq><rdf:li>Finance Team</rdf:li></rdf:Seq></dc:creator>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>"""
exis_pdfeditor.set_xmp("input.pdf", "output.pdf", xmp)
Passing an empty string writes an empty packet (still present, no data).
To drop the /Metadata reference entirely, use remove_xmp().
Replace the /Info dictionary
The PDF must already have an /Info reference in its trailer; otherwise
this raises PdfEditorError. Fields you omit are written as missing (not
empty strings) so "not set" semantics are preserved.
from datetime import datetime, timezone
exis_pdfeditor.set_info(
"input.pdf", "output.pdf",
info={
"title": "Quarterly Report",
"author": "Finance Team",
"subject": "Q1 2026 earnings summary",
"keywords": "earnings, Q1, 2026",
"creator": "exis-pdfeditor",
"producer": "Exis.PdfEditor 3.6",
"creationDate": datetime(2026, 4, 1, 9, 0, tzinfo=timezone.utc),
"modificationDate": datetime.now(timezone.utc),
# Arbitrary non-standard /Info keys
"custom": {
"Company": "Exis LLC",
"Revision": "v3.6.4",
},
},
)
Dates can be provided as datetime / date objects or ISO 8601 strings
(e.g. "2026-04-20T09:00:00Z"). Snake_case aliases creation_date and
modification_date are also accepted.
Remove metadata (wipe for privacy)
# Drop the XMP packet only
exis_pdfeditor.remove_xmp("input.pdf", "stripped-xmp.pdf")
# Empty the /Info dict (trailer reference kept, contents cleared)
exis_pdfeditor.remove_info("input.pdf", "stripped-info.pdf")
# Drop both in one incremental update (smaller output than calling both)
exis_pdfeditor.remove_metadata("input.pdf", "stripped.pdf")
Diagnostic structure dump
When a PDF fails to process and you can't share the file, dump_structure() produces
a self-contained report you can paste into a bug report. It walks every object in the
file, tallies filter chains and font subtypes, lists encryption details, and records
any streams that fail to decode — without needing the original file. No license required.
Human-readable report
dump = exis_pdfeditor.dump_structure("problem.pdf")
# The dump has a built-in text report — just print it
# (or paste it into a support email / bug report)
print(dump)
Individual fields
dump = exis_pdfeditor.dump_structure("problem.pdf")
print(f"PDF version: {dump.version}")
print(f"Pages: {dump.pageCount}")
print(f"Total objects: {dump.totalObjects}")
print(f"Stream objects: {dump.streamObjectCount}")
print(f"Xref entries: {dump.xrefEntryCount}")
print(f"Xref streams: {dump.usesXrefStreams}")
print(f"Producer: {dump.producer}")
print(f"Creator: {dump.creator}")
# Encryption details
print(f"Encrypted: {dump.isEncrypted}")
if dump.isEncrypted:
print(f" Version: {dump.encryptionVersion}")
print(f" Revision: {dump.encryptionRevision}")
print(f" Key length: {dump.encryptionKeyLengthBits} bits")
# Catalog flags
print(f"AcroForm: {dump.hasAcroForm}")
print(f"Signed: {dump.hasDigitalSignature}")
print(f"Embedded files: {dump.hasEmbeddedFiles}")
# Filter chains — which compression methods are used and how often
for chain in dump.filterChains:
print(f" {chain}: {dump.filterChains[chain]} streams")
# Font subtypes
for subtype in dump.fontSubtypes:
print(f" /{subtype}: {dump.fontSubtypes[subtype]}")
# Streams that failed to decode (capped at 50)
for bad in dump.unsupportedStreams:
print(f" obj {bad.objectNumber} [{bad.filterChain}]: {bad.error}")
# Free-form notes about parse anomalies
for note in dump.notes:
print(f" {note}")
Licensing
- Free:
inspect(),pdfa_validate(),dump_structure(), andget_metadata()work without a license. - Trial: Call
exis_pdfeditor.initialize()with no key for a 14-day full-feature trial. - Licensed: Pass your key to
exis_pdfeditor.initialize("XXXX-XXXX-XXXX-XXXX")or set theEXIS_PDF_LICENSE_KEYenvironment variable. - Evaluation: After trial expiry, all features work on documents up to 3 pages.
Purchase a license at pdfbatcheditor.com/developers.
Requirements
- Python 3.9+
- No external dependencies — the native binary is bundled in the wheel.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file exis_pdfeditor-3.8.1-py3-none-win_amd64.whl.
File metadata
- Download URL: exis_pdfeditor-3.8.1-py3-none-win_amd64.whl
- Upload date:
- Size: 48.8 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
313876898d8eb8496caf787a311fbf21fb72ca909566bf45b903a172d55382b7
|
|
| MD5 |
cccf2c26a2b530acd0344235fd49157d
|
|
| BLAKE2b-256 |
7c94f9b86adb1faa2ad427ab46eb4b1f974e6790b9ac96929a37e68f2027a2e6
|
File details
Details for the file exis_pdfeditor-3.8.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: exis_pdfeditor-3.8.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 4.8 MB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f495ff3c203401e6a3cb5897c8b7f0abaa9e9769080577bae5d75f8d9fc105cd
|
|
| MD5 |
45c39194b3a00bfae71b913016409508
|
|
| BLAKE2b-256 |
9a0abf0a5dd7aa92ecded7e64cd0640569b0c6056c934cbe99968e785606604f
|
File details
Details for the file exis_pdfeditor-3.8.1-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: exis_pdfeditor-3.8.1-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 4.7 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9238bb36fd91b393de3ce6631536d843d755e91f11e95569db00305c72f3b60
|
|
| MD5 |
a59b7daa9c5cdd2497086e642864efe1
|
|
| BLAKE2b-256 |
23ac96e541609e3aff443aab1bf4d1a9108013c3da0e8610be3801860ce04b80
|