Skip to main content

Python wrapper for Exis.PdfEditor — comprehensive PDF toolkit (find/replace, merge, split, forms, redaction, watermark, encryption, signatures, PDF/A, and more)

Project description

Exis.PdfEditor for Python

Comprehensive PDF toolkit for Python — find/replace, merge, split, form filling, redaction, image editing, watermark, Bates stamping, page editing, encryption, optimization, digital signatures, PDF/A compliance, XMP + /Info metadata, OCR for scanned PDFs, and more.

Powered by the Exis.PdfEditor .NET library, compiled to a native binary via .NET Native AOT. No .NET runtime required.

Installation

pip install exis-pdfeditor

Platform-specific wheels are available for:

  • Windows x64 (win_amd64)
  • Linux x64 (manylinux_2_17_x86_64)
  • macOS ARM64 / Apple Silicon (macosx_11_0_arm64)

Quick start

import exis_pdfeditor

# Optional: set a license key (or use EXIS_PDF_LICENSE_KEY env var).
# Without a key, a 14-day trial is activated automatically.
exis_pdfeditor.initialize("XXXX-XXXX-XXXX-XXXX")

# Inspect a PDF (free — no license required)
info = exis_pdfeditor.inspect("document.pdf")
print(f"{info.pageCount} pages, title: {info.title}")

# Find and replace text
result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "old text", "new text",
    case_sensitive=False,
)
print(f"{result.totalReplacements} replacements made")

Features

Feature Function
Inspect inspect(path) — metadata, fonts, pages, encryption status
Text extraction extract_text(path), extract_text_structured(path)
Find & replace find_replace(...) — regex, case-insensitive, whole-word, styling
Merge merge(paths, output)
Split split(path, output_dir)
Extract pages extract_pages(path, output, pages=[1,3,5])
Form fields list_fields(path, split_duplicate_widgets=...), fill_form(..., text_alignment=..., split_duplicate_widgets=..., flatten=...)
Redaction redact(path, output, redactions)
Watermark watermark(path, output, "DRAFT", position="across")
Stamp stamp(path, output, stamp_pdf, mode="overlay")
Bates stamping bates_stamp(path, output, prefix="ABC", digits=6)
Optimize optimize(path, output, downsample_images=True)
Encrypt / Decrypt encrypt(...), decrypt(...)
Page editing rotate(...), crop(...), reorder(...), delete_pages(...), insert_blank_pages(...)
Images find_images(path), replace_image(...)
Digital signatures sign(...), verify(...)
PDF/A pdfa_validate(path), pdfa_convert(path, output)
Metadata (XMP + /Info) get_metadata(path), set_xmp(...), set_info(...), remove_xmp(...), remove_info(...), remove_metadata(...)
Page classification analyze_pages(path) — pre-OCR triage (free, no license)
OCR — searchable PDF (Windows) make_searchable_pdf(path, output, languages=..., progress=...)
OCR — de-identify scans (Windows) redact_scanned_pdf(path, output, terms=..., visible_replacement=...)

Platform support

Function group Windows Linux macOS
All non-OCR functions (find/replace, merge, forms, redaction, watermark, optimization, signatures, PDF/A, metadata, …) yes yes yes
analyze_pages — page classification yes yes yes
make_searchable_pdf, redact_scanned_pdf — OCR yes not yet — raises OcrNotSupportedError not yet — raises OcrNotSupportedError

OCR is currently Windows-only because the page rasterizer (Exis.PdfOcr.Windows) depends on the WinRT Windows.Data.Pdf API and System.Drawing.Common. When a cross-platform rasterizer ships, the Python side will pick it up automatically.


Inspect

Retrieve metadata, page info, fonts, encryption status, and form field counts. No license required.

info = exis_pdfeditor.inspect("document.pdf")

print(f"Version:    {info.version}")
print(f"Pages:      {info.pageCount}")
print(f"Title:      {info.title}")
print(f"Author:     {info.author}")
print(f"Encrypted:  {info.isEncrypted}")
print(f"Has forms:  {info.hasFormFields} ({info.formFieldCount} fields)")
print(f"Fonts:      {info.fontsUsed}")

for page in info.pages:
    print(f"  Page {page.pageNumber}: {page.widthInPoints}x{page.heightInPoints} pt, "
          f"{page.characterCount} chars")

Text extraction

Plain text

result = exis_pdfeditor.extract_text("document.pdf")
print(result.fullText)

for page in result.pages:
    print(f"Page {page.pageNumber}: {page.text[:100]}...")

Specific pages only

result = exis_pdfeditor.extract_text("document.pdf", pages=[1, 3, 5])

Structured text (with positions and font data)

result = exis_pdfeditor.extract_text_structured("document.pdf")

for page in result.pages:
    for block in page.textBlocks:
        print(f"  [{block.x:.0f}, {block.y:.0f}] {block.fontName} {block.fontSize}pt: "
              f"{block.text}")

Find & replace

Single pair

result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "old text", "new text",
)
print(f"{result.totalReplacements} replacements across {result.pagesModified} pages")

Multiple pairs

result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    pairs=[
        {"search": "foo", "replace": "bar"},
        {"search": "hello", "replace": "world"},
    ],
)

Regex

result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    pairs=[
        {"search": r"\d{3}-\d{2}-\d{4}", "replace": "XXX-XX-XXXX", "isRegex": True},
    ],
)

All options

result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "confidential", "[REDACTED]",
    case_sensitive=False,
    whole_word=True,
    use_regex=False,
    page_range=[1, 2, 3],
    text_fitting="adaptive",         # none, preserve_width, fit_to_page, adaptive
    min_horizontal_scale=70,
    max_font_size_reduction=1.5,
    replacement_text_color={"r": 1, "g": 0, "b": 0},
    replacement_highlight_color={"r": 1, "g": 1, "b": 0},
    replacement_bold=True,
    replacement_underline=False,
    replacement_strikethrough=False,
    preserve_form_fields=True,
    use_incremental_update=True,
)

for detail in result.details:
    print(f"  Page {detail.pageNumber}: '{detail.originalText}' -> '{detail.replacementText}'")

Merge

exis_pdfeditor.merge(["part1.pdf", "part2.pdf", "part3.pdf"], "merged.pdf")

Split

Split a PDF into one file per page:

result = exis_pdfeditor.split("document.pdf", "output_folder/")
print(f"Split into {result.pageCount} files")
for path in result.files:
    print(f"  {path}")

Extract pages

Extract specific pages into a new PDF:

exis_pdfeditor.extract_pages("document.pdf", "pages_1_and_3.pdf", pages=[1, 3])

Form fields

Each field has a PDF name, optional displayName (smart label from nearby page text), type, value, options (choice fields), isReadOnly, and hasDuplicateWidgets (true when one logical field has multiple widgets at different positions — e.g. two-up receipts). Pass split_duplicate_widgets=True to list_fields / fill_form to use per-widget keys name_1, name_2, … (page-reading order).

List all fields

fields = exis_pdfeditor.list_fields("form.pdf")

for field in fields:
    print(f"  {field.name} ({field.type}): {field.value}")
    if field.displayName:
        print(f"    Label: {field.displayName}")
    if field.options:
        print(f"    Options: {field.options}")
    if field.hasDuplicateWidgets:
        print("    (duplicate widgets — use split_duplicate_widgets=True for per-widget names)")

List with duplicate-widget split (two-up / carbon copy)

fields = exis_pdfeditor.list_fields("two-up-receipt.pdf", split_duplicate_widgets=True)
# e.g. Address_1, Address_2, date_1, date_2, ...

Fill a form

result = exis_pdfeditor.fill_form(
    "form.pdf", "filled.pdf",
    fields={
        "FirstName": "John",
        "LastName": "Doe",
        "Email": "john@example.com",
        "AgreeToTerms": "Yes",
    },
)
print(f"Filled {result.fieldsFilled} fields, {result.fieldsNotFound} not found")

Force text alignment (default honors each field's PDF /Q)

result = exis_pdfeditor.fill_form(
    "form.pdf", "filled_centered.pdf",
    fields={"Comments": "Centered text"},
    text_alignment="center",  # "auto" | "left" | "center" | "right"
)

Fill duplicate-widget fields (different value per widget; auto-flattens)

result = exis_pdfeditor.fill_form(
    "two-up-receipt.pdf", "filled.pdf",
    fields={
        "Address_1": "Mr. David G Cruz",
        "date_1": "10-Jan-67",
        "Address_2": "Ms. Claudia Morales",
        "date_2": "22-Feb-26",
    },
    split_duplicate_widgets=True,
)

Fill and flatten (make fields non-editable)

result = exis_pdfeditor.fill_form(
    "form.pdf", "filled_flat.pdf",
    fields={"Name": "Jane Doe"},
    flatten=True,
)

Redaction

Redact text

result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {"text": "John Doe", "replaceWith": "[NAME REDACTED]"},
        {"text": r"\d{3}-\d{2}-\d{4}", "isRegex": True, "replaceWith": "[SSN]"},
    ],
)
print(f"{result.redactionsApplied} redactions applied")

Redact a specific area on a page

result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {
            "area": {"x": 100, "y": 700, "width": 200, "height": 20},
            "pageNumber": 1,
        },
    ],
)

Combined text + area redaction

result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {"text": "confidential", "caseSensitive": False},
        {"area": {"x": 50, "y": 50, "width": 500, "height": 30}, "pageNumber": 2},
    ],
)

Page classification (pre-OCR triage)

analyze_pages classifies every page as Digital, Scanned, AlreadyOcrd, or Empty purely from the content stream — no rendering and no OCR. Use it to predict which pages will need OCR before paying for a run. Free, no license required, available on every wheel.

pages = exis_pdfeditor.analyze_pages("mixed.pdf")
for p in pages:
    print(f"  page {p.pageNumber}: {p.kind}  "
          f"(text {p.textCoverageRatio:.0%}, images {p.imageCoverageRatio:.0%})")

scanned = [p.pageNumber for p in pages if p.kind == "Scanned"]
if scanned:
    print(f"{len(scanned)} pages need OCR: {scanned}")

Each entry has pageNumber, kind, textCharCount, textCoverageRatio, imageCoverageRatio, and hasInvisibleTextLayer (signature of an already-OCR'd page).

Searchable PDFs (OCR) — Windows only

make_searchable_pdf adds an invisible, selectable text layer over scanned pages so find/replace, extraction, and redaction work on the result like any digital PDF. Born-digital and already-OCR'd pages are passed through untouched — only true scans are rasterized and recognized, which is fast on mixed documents.

result = exis_pdfeditor.make_searchable_pdf(
    "scan.pdf", "searchable.pdf",
    languages=("eng",),                   # default
    progress=lambda p: print(f"  page {p.page}/{p.total}: {p.phase}"),
)

print(f"OCR'd {result.pagesProcessed}, skipped {result.pagesSkipped}, "
      f"avg confidence {result.averageConfidence:.0%}")

# Confidence is never swallowed — flag low-confidence pages for human review.
for page in result.pages:
    if page.wasOcrd and page.confidence < 0.80:
        print(f"  page {page.pageNumber} needs review (conf {page.confidence:.0%})")

Additional languages

The wheel ships English (eng.traineddata). To add languages, drop the matching xxx.traineddata files into the wheel's bundled tessdata/ folder, or place them in your own folder and point at it:

result = exis_pdfeditor.make_searchable_pdf(
    "scan.pdf", "searchable.pdf",
    languages=("eng", "spa", "vie"),
    tessdata_path=r"C:\my-app\tessdata",
)

The invisible text layer embeds a Unicode font (DejaVu Sans), so accented and non-Latin names (e.g. Nguyễn, Peña) survive find/replace intact rather than being mangled.

Options

Parameter Default Notes
languages ("eng",) Tesseract language codes
dpi 300 Rendering DPI for non-image pages (full-page scans use their native resolution)
apply_deskew True Pre-process hint (engine-dependent)
apply_denoise False Pre-process hint (engine-dependent)
min_confidence_to_include 0.0 Words below this are still included (dropping them would risk leaking names you need to match) but counted in per-page confidence
tessdata_path bundled Override the trained-data folder
progress None Callback (page, total, phase, confidence)

De-identify scanned pages — Windows only

redact_scanned_pdf finds terms on scanned pages via OCR and burns opaque boxes into the actual page image. Unlike OCR-layer redaction (which only changes selectable text), this destroys the underlying pixels — the term is gone from the visible scan and unrecoverable, appropriate for sending de-identified scans externally.

r = exis_pdfeditor.redact_scanned_pdf(
    "intake_scan.pdf", "intake_redacted.pdf",
    terms=["Jane Doe", "555-12-3456"],
    visible_replacement="[REDACTED]",     # drawn into the raster
    fill="white",                         # or "black"
)

print(f"Redacted {r.occurrencesRedacted} occurrences on {r.pagesAffected} pages")
if r.pagesSkipped:
    print(f"WARNING: {r.pagesSkipped} pages could not be safely burned — "
          "review them before sending the output.")

for occ in r.occurrences:
    print(f"  page {occ.pageNumber}: {occ.term!r} (conf {occ.confidence:.0%})")

Scope: this handles image-based pages. Born-digital text pages are NOT modified — use redact(...) for those.

Pages that are image-based but not a single full-page image are reported in pagesSkipped and left unchanged. A non-zero value means the output is not safe to send as-is.

The output is rewritten via the optimizer to drop the superseded original image objects, so the un-redacted pixels are not recoverable with PDF forensics.

Watermark

Basic diagonal watermark

result = exis_pdfeditor.watermark("input.pdf", "watermarked.pdf", "DRAFT")
print(f"Watermarked {result.pagesWatermarked} of {result.totalPages} pages")

All watermark options

result = exis_pdfeditor.watermark(
    "input.pdf", "watermarked.pdf",
    "CONFIDENTIAL",
    position="across",          # top, bottom, center, across
    font_size=72,
    text_color={"r": 1, "g": 0, "b": 0},
    opacity=0.15,
    page_range=[1, 2],
)

Stamp (PDF overlay / underlay)

Overlay a letterhead on top

result = exis_pdfeditor.stamp(
    "document.pdf", "stamped.pdf",
    "letterhead.pdf",
    mode="overlay",
)
print(f"Stamped {result.pagesStamped} pages")

Underlay a background behind content

result = exis_pdfeditor.stamp(
    "document.pdf", "stamped.pdf",
    "background.pdf",
    mode="underlay",
    opacity=0.5,
    page_range=[1],
)

Bates stamping

Sequential page numbering for legal production and discovery workflows. Each page receives a zero-padded identifier (e.g. ABC000001, ABC000002) rendered in a chosen corner of the visual page. Placement is relative to the page's /Rotate orientation, so mixed-rotation documents render consistently. An XMP audit block recording the range, digit width, and prefix/suffix is written to the document catalog by default.

Defaults (number starts at 1, 6 digits, bottom-right corner)

result = exis_pdfeditor.bates_stamp("input.pdf", "stamped.pdf")
print(f"Stamped pages {result.firstNumber}-{result.lastNumber}")
print(f"Digits used: {result.digitsUsed}")

Prefix, custom position, color, confidentiality label

result = exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    prefix="ABC",
    start_number=1,
    digits=6,                                 # -> "ABC000001"
    position="bottom_right",                  # top_left, top_center, top_right,
                                              # bottom_left, bottom_center, bottom_right
    font_size=10,
    text_color={"r": 0, "g": 0, "b": 0},
    background_color={"r": 1, "g": 1, "b": 1},  # opaque box behind stamp
    margin_inches=0.5,
    confidentiality_label="CONFIDENTIAL",     # stacked above the Bates number
)

Continuous numbering across a batch

Thread lastNumber + 1 into the next call:

next_n = 1
for path in docs:
    r = exis_pdfeditor.bates_stamp(
        path, path + ".stamped.pdf",
        prefix="ABC", start_number=next_n,
    )
    next_n = r.lastNumber + 1

Skip the cover page (legal convention)

The cover is ABC000001 in the production log even if physically unstamped:

exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    prefix="ABC",
    skip_first_page=True,
    counter_advances_on_skipped_pages=True,   # default
)

Stamp only selected pages

exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    page_range=[2, 3, 5],                     # 1-based
)

Signed input

Stamping invalidates signatures — opt in explicitly:

result = exis_pdfeditor.bates_stamp(
    "signed.pdf", "stamped.pdf",
    allow_signed_input=True,                  # result.warnings will record it
)
for w in result.warnings:
    print(f"  warning: {w}")

Suppress the XMP audit block

exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    write_xmp_metadata=False,
)

All options at a glance

Parameter Default Purpose
prefix / suffix "" Text bracketing the number
start_number 1 First Bates number (for batch continuation)
digits 6 Minimum zero-padded width; auto-expands if needed
position "bottom_right" Corner/edge on the visual page
font_size 10.0 Point size (Helvetica)
text_color black RGB dict {"r": 0, "g": 0, "b": 0}
background_color none Opaque rectangle behind stamp for legibility
margin_inches 0.5 Distance from trimmed page edge
confidentiality_label none e.g. "CONFIDENTIAL"
confidentiality_position same corner Override to place label elsewhere
confidentiality_font_size matches font_size Label point size
page_range all pages 1-based page numbers to stamp
skip_first_page False Skip page 1 (cover sheets)
counter_advances_on_skipped_pages True Advance counter on skipped pages
allow_signed_input False Stamp PDFs with digital signatures
write_xmp_metadata True Write an XMP audit block to the catalog

Result fields: firstNumber, lastNumber, pagesStamped, digitsUsed (equals digits unless auto-expanded), warnings (non-fatal diagnostics: digit expansion, signed-input stamped anyway, etc.).

Optimize

Default optimization (compress + deduplicate)

result = exis_pdfeditor.optimize("large.pdf", "smaller.pdf")
print(f"Reduced {result.originalSize:,} -> {result.optimizedSize:,} bytes "
      f"({result.reductionPercent:.1f}% smaller)")
print(f"  Streams compressed: {result.streamsCompressed}")
print(f"  Duplicates removed: {result.duplicatesRemoved}")

With image downsampling

result = exis_pdfeditor.optimize(
    "large.pdf", "smaller.pdf",
    downsample_images=True,
    max_image_dpi=150,
    remove_metadata=True,
)
print(f"Images downsampled: {result.imagesDownsampled}")

Encrypt & decrypt

Encrypt with a password

exis_pdfeditor.encrypt(
    "document.pdf", "protected.pdf",
    user_password="openme",
    owner_password="secret",
    permissions=["Print", "CopyText"],
)

Available permissions: Print, ModifyContents, CopyText, AddAnnotations, FillForms, PrintHighQuality, All.

Decrypt

exis_pdfeditor.decrypt("protected.pdf", "unlocked.pdf", password="openme")

Page editing

Rotate pages

result = exis_pdfeditor.rotate("input.pdf", "rotated.pdf", angle=90)
print(f"Rotated {result.pagesModified} pages")

# Rotate only specific pages
result = exis_pdfeditor.rotate("input.pdf", "rotated.pdf", angle=180, pages=[2, 4])

Crop pages

result = exis_pdfeditor.crop(
    "input.pdf", "cropped.pdf",
    rect={"x": 50, "y": 50, "width": 500, "height": 700},
)

Reorder pages

exis_pdfeditor.reorder("input.pdf", "reordered.pdf", order=[3, 1, 2])

Delete pages

exis_pdfeditor.delete_pages("input.pdf", "trimmed.pdf", pages=[2, 4])

Insert blank pages

Each insertion specifies a 0-based "after page" anchor (0 = before page 1) and an optional page size. Supported sizes: letter (default), legal, a4, a3, a5, tabloid.

exis_pdfeditor.insert_blank_pages(
    "input.pdf", "with_blanks.pdf",
    insertions=[
        {"afterPage": 0, "size": "a4"},       # before page 1
        {"afterPage": 3, "size": "letter"},   # after page 3
    ],
)

Images

Find all images

result = exis_pdfeditor.find_images("document.pdf")
print(f"Found {result.totalImages} images across {result.pagesSearched} pages")

for img in result.images:
    print(f"  Image {img.index}: {img.pixelWidth}x{img.pixelHeight} "
          f"{img.colorSpace} {img.format}")
    print(f"    Pages: {img.pageNumbers}")

Find images and save to disk

result = exis_pdfeditor.find_images("document.pdf", output_dir="extracted_images/")

Replace all images

result = exis_pdfeditor.replace_image(
    "document.pdf", "replaced.pdf",
    "new_logo.png",
)
print(f"Replaced {result.imagesReplaced} of {result.imagesFound} images")

Replace specific images by index or page

result = exis_pdfeditor.replace_image(
    "document.pdf", "replaced.pdf",
    "new_logo.jpg",
    image_indices=[0, 2],
    page_range=[1],
    scale_mode="scale_to_fit",  # match_original_size, preserve_aspect_ratio, scale_to_fit
)

Digital signatures

Sign a PDF

exis_pdfeditor.sign(
    "document.pdf", "signed.pdf",
    cert_path="certificate.pfx",
    cert_password="certpass",
    reason="Approved",
    location="New York, NY",
    signer_name="John Doe",
)

Sign with a visible signature

exis_pdfeditor.sign(
    "document.pdf", "signed.pdf",
    cert_path="certificate.pfx",
    cert_password="certpass",
    visible=True,
    page=1,
    rect={"x": 50, "y": 50, "width": 200, "height": 60},
    reason="Reviewed and approved",
)

Verify signatures

info = exis_pdfeditor.verify("signed.pdf")
print(f"Signed: {info.isSigned}")
print(f"Signer: {info.signerName}")
print(f"Valid:  {info.isValid}")
print(f"Reason: {info.reason}")
print(f"Date:   {info.signDate}")

# Verify all signatures in a multi-signed document
signatures = exis_pdfeditor.verify("multi_signed.pdf", all_signatures=True)
for sig in signatures:
    print(f"  {sig.signerName}: valid={sig.isValid}")

PDF/A compliance

Validate

result = exis_pdfeditor.pdfa_validate("document.pdf", level="2b")
print(f"Compliant: {result.isCompliant}")

if not result.isCompliant:
    for v in result.violations:
        print(f"  [{v.code}] {v.message} (auto-fixable: {v.canAutoFix})")

Convert to PDF/A

exis_pdfeditor.pdfa_convert("document.pdf", "archive.pdf", level="2b")

Supported levels: 1b, 2b, 2u, 3b, 3u.

XMP + /Info metadata

PDFs carry document metadata in two places:

  • XMP — the modern RDF/XML packet referenced from the Catalog /Metadata entry.
  • /Info — the legacy trailer dictionary (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate, plus custom keys).

get_metadata() is free and works on encrypted PDFs (XMP only). All mutation calls (set_xmp, set_info, remove_xmp, remove_info, remove_metadata) require a license and reject encrypted input — decrypt first with decrypt().

Read metadata

meta = exis_pdfeditor.get_metadata("document.pdf")

if meta.hasXmp:
    print(f"XMP packet: {meta.xmpByteSize} bytes")
    print(meta.xmpXml)          # Full <?xpacket ... ?> payload as UTF-8 text

if meta.hasInfo:
    info = meta.info
    print(f"Title:    {info.title}")
    print(f"Author:   {info.author}")
    print(f"Subject:  {info.subject}")
    print(f"Keywords: {info.keywords}")
    print(f"Creator:  {info.creator}")
    print(f"Producer: {info.producer}")
    print(f"Created:  {info.creationDate}")        # ISO 8601 string or None
    print(f"Modified: {info.modificationDate}")    # same

    # Non-standard /Info keys (e.g. "Company", custom producer fields)
    for key, value in vars(info.custom).items():
        print(f"  [custom] {key} = {value}")

Replace the XMP packet

Pass a full <?xpacket ...?>-wrapped RDF/XML document as a string:

xmp = """<?xpacket begin=""?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/">
      <dc:title><rdf:Alt><rdf:li xml:lang="x-default">Quarterly Report</rdf:li></rdf:Alt></dc:title>
      <dc:creator><rdf:Seq><rdf:li>Finance Team</rdf:li></rdf:Seq></dc:creator>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>"""

exis_pdfeditor.set_xmp("input.pdf", "output.pdf", xmp)

Passing an empty string writes an empty packet (still present, no data). To drop the /Metadata reference entirely, use remove_xmp().

Replace the /Info dictionary

The PDF must already have an /Info reference in its trailer; otherwise this raises PdfEditorError. Fields you omit are written as missing (not empty strings) so "not set" semantics are preserved.

from datetime import datetime, timezone

exis_pdfeditor.set_info(
    "input.pdf", "output.pdf",
    info={
        "title":    "Quarterly Report",
        "author":   "Finance Team",
        "subject":  "Q1 2026 earnings summary",
        "keywords": "earnings, Q1, 2026",
        "creator":  "exis-pdfeditor",
        "producer": "Exis.PdfEditor 3.6",
        "creationDate":     datetime(2026, 4, 1, 9, 0, tzinfo=timezone.utc),
        "modificationDate": datetime.now(timezone.utc),
        # Arbitrary non-standard /Info keys
        "custom": {
            "Company":  "Exis LLC",
            "Revision": "v3.6.4",
        },
    },
)

Dates can be provided as datetime / date objects or ISO 8601 strings (e.g. "2026-04-20T09:00:00Z"). Snake_case aliases creation_date and modification_date are also accepted.

Remove metadata (wipe for privacy)

# Drop the XMP packet only
exis_pdfeditor.remove_xmp("input.pdf", "stripped-xmp.pdf")

# Empty the /Info dict (trailer reference kept, contents cleared)
exis_pdfeditor.remove_info("input.pdf", "stripped-info.pdf")

# Drop both in one incremental update (smaller output than calling both)
exis_pdfeditor.remove_metadata("input.pdf", "stripped.pdf")

Diagnostic structure dump

When a PDF fails to process and you can't share the file, dump_structure() produces a self-contained report you can paste into a bug report. It walks every object in the file, tallies filter chains and font subtypes, lists encryption details, and records any streams that fail to decode — without needing the original file. No license required.

Human-readable report

dump = exis_pdfeditor.dump_structure("problem.pdf")

# The dump has a built-in text report — just print it
# (or paste it into a support email / bug report)
print(dump)

Individual fields

dump = exis_pdfeditor.dump_structure("problem.pdf")

print(f"PDF version:    {dump.version}")
print(f"Pages:          {dump.pageCount}")
print(f"Total objects:  {dump.totalObjects}")
print(f"Stream objects: {dump.streamObjectCount}")
print(f"Xref entries:   {dump.xrefEntryCount}")
print(f"Xref streams:   {dump.usesXrefStreams}")
print(f"Producer:       {dump.producer}")
print(f"Creator:        {dump.creator}")

# Encryption details
print(f"Encrypted:      {dump.isEncrypted}")
if dump.isEncrypted:
    print(f"  Version:      {dump.encryptionVersion}")
    print(f"  Revision:     {dump.encryptionRevision}")
    print(f"  Key length:   {dump.encryptionKeyLengthBits} bits")

# Catalog flags
print(f"AcroForm:       {dump.hasAcroForm}")
print(f"Signed:         {dump.hasDigitalSignature}")
print(f"Embedded files: {dump.hasEmbeddedFiles}")

# Filter chains — which compression methods are used and how often
for chain in dump.filterChains:
    print(f"  {chain}: {dump.filterChains[chain]} streams")

# Font subtypes
for subtype in dump.fontSubtypes:
    print(f"  /{subtype}: {dump.fontSubtypes[subtype]}")

# Streams that failed to decode (capped at 50)
for bad in dump.unsupportedStreams:
    print(f"  obj {bad.objectNumber} [{bad.filterChain}]: {bad.error}")

# Free-form notes about parse anomalies
for note in dump.notes:
    print(f"  {note}")

Licensing

  • Free: inspect(), pdfa_validate(), dump_structure(), and get_metadata() work without a license.
  • Trial: Call exis_pdfeditor.initialize() with no key for a 14-day full-feature trial.
  • Licensed: Pass your key to exis_pdfeditor.initialize("XXXX-XXXX-XXXX-XXXX") or set the EXIS_PDF_LICENSE_KEY environment variable.
  • Evaluation: After trial expiry, all features work on documents up to 3 pages.

Purchase a license at pdfbatcheditor.com/developers.

Requirements

  • Python 3.9+
  • No external dependencies — the native binary is bundled in the wheel.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

exis_pdfeditor-3.8.0-py3-none-win_amd64.whl (48.8 MB view details)

Uploaded Python 3Windows x86-64

exis_pdfeditor-3.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

exis_pdfeditor-3.8.0-py3-none-macosx_11_0_arm64.whl (4.7 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file exis_pdfeditor-3.8.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for exis_pdfeditor-3.8.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 eaf9561cef09e8aeae7d295ecde96cbfc514847b661264079de6d53219749293
MD5 e34e08a5809da565845a05624d97536c
BLAKE2b-256 9796155c3328101322d91d993e6a12a6d8d97899792807b9623b348a95d82187

See more details on using hashes here.

File details

Details for the file exis_pdfeditor-3.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for exis_pdfeditor-3.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3453377bf2a6eefe8e2d5f2c7e25f72653c3c758fbb1b2d37582c6496c52f3bc
MD5 0ca3c005cfef4d8021b8dcfb673a8216
BLAKE2b-256 75ee5985592478b85b37b34cb22cdc09eea31b2a0fb2e5f5c50296b44e484d10

See more details on using hashes here.

File details

Details for the file exis_pdfeditor-3.8.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for exis_pdfeditor-3.8.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3425ee0b65683ac522a31a8f8f88336a963da2908c87e44240d4620a01497d24
MD5 2818ebfa197aff2df05b658ac332e62f
BLAKE2b-256 19db710900129f2394ed1f836471979afba97622658ec36e42489f456d3b90c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page