Python wrapper for Exis.PdfEditor — comprehensive PDF toolkit (find/replace, merge, split, forms, redaction, watermark, encryption, signatures, PDF/A, and more)

These details have not been verified by PyPI

Project links

Project description

Exis.PdfEditor for Python

Comprehensive PDF toolkit for Python — find/replace, merge, split, form filling, redaction, image editing, watermark, Bates stamping, page editing, encryption, optimization, digital signatures, PDF/A compliance, XMP + /Info metadata, OCR for scanned PDFs, and more.

Powered by the Exis.PdfEditor .NET library, compiled to a native binary via .NET Native AOT. No .NET runtime required.

Installation

pip install exis-pdfeditor

Platform-specific wheels are available for:

Windows x64 (win_amd64)
Linux x64 (manylinux_2_17_x86_64)
macOS ARM64 / Apple Silicon (macosx_11_0_arm64)

Quick start

import exis_pdfeditor

# Optional: set a license key (or use EXIS_PDF_LICENSE_KEY env var).
# Without a key, a 14-day trial is activated automatically.
exis_pdfeditor.initialize("XXXX-XXXX-XXXX-XXXX")

# Inspect a PDF (free — no license required)
info = exis_pdfeditor.inspect("document.pdf")
print(f"{info.pageCount} pages, title: {info.title}")

# Find and replace text
result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "old text", "new text",
    case_sensitive=False,
)
print(f"{result.totalReplacements} replacements made")

Features

Feature	Function
Inspect	`inspect(path)` — metadata, fonts, pages, encryption status
Text extraction	`extract_text(path)`, `extract_text_structured(path)`
Find & replace	`find_replace(...)` — regex, case-insensitive, whole-word, styling
Merge	`merge(paths, output)`
Split	`split(path, output_dir)`
Extract pages	`extract_pages(path, output, pages=[1,3,5])`
Form fields	`list_fields(path, split_duplicate_widgets=...)`, `fill_form(..., text_alignment=..., split_duplicate_widgets=..., flatten=...)`
Redaction	`redact(path, output, redactions)`
Watermark	`watermark(path, output, "DRAFT", position="across")`
Stamp	`stamp(path, output, stamp_pdf, mode="overlay")`
Bates stamping	`bates_stamp(path, output, prefix="ABC", digits=6)`
Optimize	`optimize(path, output, downsample_images=True)`
Encrypt / Decrypt	`encrypt(...)`, `decrypt(...)`
Page editing	`rotate(...)`, `crop(...)`, `reorder(...)`, `delete_pages(...)`, `insert_blank_pages(...)`
Images	`find_images(path)`, `replace_image(...)`
Digital signatures	`sign(...)`, `verify(...)`
PDF/A	`pdfa_validate(path)`, `pdfa_convert(path, output)`
Metadata (XMP + `/Info`)	`get_metadata(path)`, `set_xmp(...)`, `set_info(...)`, `remove_xmp(...)`, `remove_info(...)`, `remove_metadata(...)`
Page classification	`analyze_pages(path)` — pre-OCR triage (free, no license)
OCR — searchable PDF (Windows)	`make_searchable_pdf(path, output, languages=..., progress=...)`
OCR — de-identify scans (Windows)	`redact_scanned_pdf(path, output, terms=..., visible_replacement=...)`

Platform support

Function group	Windows	Linux	macOS
All non-OCR functions (find/replace, merge, forms, redaction, watermark, optimization, signatures, PDF/A, metadata, …)	yes	yes	yes
`analyze_pages` — page classification	yes	yes	yes
`make_searchable_pdf`, `redact_scanned_pdf` — OCR	yes	not yet — raises `OcrNotSupportedError`	not yet — raises `OcrNotSupportedError`

OCR is currently Windows-only because the page rasterizer (Exis.PdfOcr.Windows) depends on the WinRT Windows.Data.Pdf API and System.Drawing.Common. When a cross-platform rasterizer ships, the Python side will pick it up automatically.

Inspect

Retrieve metadata, page info, fonts, encryption status, and form field counts. No license required.

info = exis_pdfeditor.inspect("document.pdf")

print(f"Version:    {info.version}")
print(f"Pages:      {info.pageCount}")
print(f"Title:      {info.title}")
print(f"Author:     {info.author}")
print(f"Encrypted:  {info.isEncrypted}")
print(f"Has forms:  {info.hasFormFields} ({info.formFieldCount} fields)")
print(f"Fonts:      {info.fontsUsed}")

for page in info.pages:
    print(f"  Page {page.pageNumber}: {page.widthInPoints}x{page.heightInPoints} pt, "
          f"{page.characterCount} chars")

Text extraction

Plain text

result = exis_pdfeditor.extract_text("document.pdf")
print(result.fullText)

for page in result.pages:
    print(f"Page {page.pageNumber}: {page.text[:100]}...")

Specific pages only

result = exis_pdfeditor.extract_text("document.pdf", pages=[1, 3, 5])

Structured text (with positions and font data)

result = exis_pdfeditor.extract_text_structured("document.pdf")

for page in result.pages:
    for block in page.textBlocks:
        print(f"  [{block.x:.0f}, {block.y:.0f}] {block.fontName} {block.fontSize}pt: "
              f"{block.text}")

Find & replace

Single pair

result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "old text", "new text",
)
print(f"{result.totalReplacements} replacements across {result.pagesModified} pages")

Multiple pairs

result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    pairs=[
        {"search": "foo", "replace": "bar"},
        {"search": "hello", "replace": "world"},
    ],
)

Regex

result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    pairs=[
        {"search": r"\d{3}-\d{2}-\d{4}", "replace": "XXX-XX-XXXX", "isRegex": True},
    ],
)

All options

result = exis_pdfeditor.find_replace(
    "input.pdf", "output.pdf",
    "confidential", "[REDACTED]",
    case_sensitive=False,
    whole_word=True,
    use_regex=False,
    page_range=[1, 2, 3],
    text_fitting="adaptive",         # none, preserve_width, fit_to_page, adaptive
    min_horizontal_scale=70,
    max_font_size_reduction=1.5,
    replacement_text_color={"r": 1, "g": 0, "b": 0},
    replacement_highlight_color={"r": 1, "g": 1, "b": 0},
    replacement_bold=True,
    replacement_underline=False,
    replacement_strikethrough=False,
    preserve_form_fields=True,
    use_incremental_update=True,
)

for detail in result.details:
    print(f"  Page {detail.pageNumber}: '{detail.originalText}' -> '{detail.replacementText}'")

Merge

exis_pdfeditor.merge(["part1.pdf", "part2.pdf", "part3.pdf"], "merged.pdf")

Split

Split a PDF into one file per page:

result = exis_pdfeditor.split("document.pdf", "output_folder/")
print(f"Split into {result.pageCount} files")
for path in result.files:
    print(f"  {path}")

Extract pages

Extract specific pages into a new PDF:

exis_pdfeditor.extract_pages("document.pdf", "pages_1_and_3.pdf", pages=[1, 3])

Form fields

Each field has a PDF name, optional displayName (smart label from nearby page text), type, value, options (choice fields), isReadOnly, and hasDuplicateWidgets (true when one logical field has multiple widgets at different positions — e.g. two-up receipts). Pass split_duplicate_widgets=True to list_fields / fill_form to use per-widget keys name_1, name_2, … (page-reading order).

List all fields

fields = exis_pdfeditor.list_fields("form.pdf")

for field in fields:
    print(f"  {field.name} ({field.type}): {field.value}")
    if field.displayName:
        print(f"    Label: {field.displayName}")
    if field.options:
        print(f"    Options: {field.options}")
    if field.hasDuplicateWidgets:
        print("    (duplicate widgets — use split_duplicate_widgets=True for per-widget names)")

List with duplicate-widget split (two-up / carbon copy)

fields = exis_pdfeditor.list_fields("two-up-receipt.pdf", split_duplicate_widgets=True)
# e.g. Address_1, Address_2, date_1, date_2, ...

Fill a form

result = exis_pdfeditor.fill_form(
    "form.pdf", "filled.pdf",
    fields={
        "FirstName": "John",
        "LastName": "Doe",
        "Email": "john@example.com",
        "AgreeToTerms": "Yes",
    },
)
print(f"Filled {result.fieldsFilled} fields, {result.fieldsNotFound} not found")

Force text alignment (default honors each field's PDF `/Q`)

result = exis_pdfeditor.fill_form(
    "form.pdf", "filled_centered.pdf",
    fields={"Comments": "Centered text"},
    text_alignment="center",  # "auto" | "left" | "center" | "right"
)

Fill duplicate-widget fields (different value per widget; auto-flattens)

result = exis_pdfeditor.fill_form(
    "two-up-receipt.pdf", "filled.pdf",
    fields={
        "Address_1": "Mr. David G Cruz",
        "date_1": "10-Jan-67",
        "Address_2": "Ms. Claudia Morales",
        "date_2": "22-Feb-26",
    },
    split_duplicate_widgets=True,
)

Fill and flatten (make fields non-editable)

result = exis_pdfeditor.fill_form(
    "form.pdf", "filled_flat.pdf",
    fields={"Name": "Jane Doe"},
    flatten=True,
)

Redaction

Redact text

result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {"text": "John Doe", "replaceWith": "[NAME REDACTED]"},
        {"text": r"\d{3}-\d{2}-\d{4}", "isRegex": True, "replaceWith": "[SSN]"},
    ],
)
print(f"{result.redactionsApplied} redactions applied")

Redact a specific area on a page

result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {
            "area": {"x": 100, "y": 700, "width": 200, "height": 20},
            "pageNumber": 1,
        },
    ],
)

Combined text + area redaction

result = exis_pdfeditor.redact(
    "document.pdf", "redacted.pdf",
    redactions=[
        {"text": "confidential", "caseSensitive": False},
        {"area": {"x": 50, "y": 50, "width": 500, "height": 30}, "pageNumber": 2},
    ],
)

Page classification (pre-OCR triage)

analyze_pages classifies every page as Digital, Scanned, AlreadyOcrd, or Empty purely from the content stream — no rendering and no OCR. Use it to predict which pages will need OCR before paying for a run. Free, no license required, available on every wheel.

pages = exis_pdfeditor.analyze_pages("mixed.pdf")
for p in pages:
    print(f"  page {p.pageNumber}: {p.kind}  "
          f"(text {p.textCoverageRatio:.0%}, images {p.imageCoverageRatio:.0%})")

scanned = [p.pageNumber for p in pages if p.kind == "Scanned"]
if scanned:
    print(f"{len(scanned)} pages need OCR: {scanned}")

Each entry has pageNumber, kind, textCharCount, textCoverageRatio, imageCoverageRatio, and hasInvisibleTextLayer (signature of an already-OCR'd page).

Searchable PDFs (OCR) — Windows only

make_searchable_pdf adds an invisible, selectable text layer over scanned pages so find/replace, extraction, and redaction work on the result like any digital PDF. Born-digital and already-OCR'd pages are passed through untouched — only true scans are rasterized and recognized, which is fast on mixed documents.

result = exis_pdfeditor.make_searchable_pdf(
    "scan.pdf", "searchable.pdf",
    languages=("eng",),                   # default
    progress=lambda p: print(f"  page {p.page}/{p.total}: {p.phase}"),
)

print(f"OCR'd {result.pagesProcessed}, skipped {result.pagesSkipped}, "
      f"avg confidence {result.averageConfidence:.0%}")

# Confidence is never swallowed — flag low-confidence pages for human review.
for page in result.pages:
    if page.wasOcrd and page.confidence < 0.80:
        print(f"  page {page.pageNumber} needs review (conf {page.confidence:.0%})")

Additional languages

The wheel ships English (eng.traineddata). To add languages, drop the matching xxx.traineddata files into the wheel's bundled tessdata/ folder, or place them in your own folder and point at it:

result = exis_pdfeditor.make_searchable_pdf(
    "scan.pdf", "searchable.pdf",
    languages=("eng", "spa", "vie"),
    tessdata_path=r"C:\my-app\tessdata",
)

The invisible text layer embeds a Unicode font (DejaVu Sans), so accented and non-Latin names (e.g. Nguyễn, Peña) survive find/replace intact rather than being mangled.

Options

Parameter	Default	Notes
`languages`	`("eng",)`	Tesseract language codes
`dpi`	`300`	Rendering DPI for non-image pages (full-page scans use their native resolution)
`apply_deskew`	`True`	Pre-process hint (engine-dependent)
`apply_denoise`	`False`	Pre-process hint (engine-dependent)
`min_confidence_to_include`	`0.0`	Words below this are still included (dropping them would risk leaking names you need to match) but counted in per-page confidence
`tessdata_path`	bundled	Override the trained-data folder
`progress`	`None`	Callback `(page, total, phase, confidence)`

De-identify scanned pages — Windows only

redact_scanned_pdf finds terms on scanned pages via OCR and burns opaque boxes into the actual page image. Unlike OCR-layer redaction (which only changes selectable text), this destroys the underlying pixels — the term is gone from the visible scan and unrecoverable, appropriate for sending de-identified scans externally.

r = exis_pdfeditor.redact_scanned_pdf(
    "intake_scan.pdf", "intake_redacted.pdf",
    terms=["Jane Doe", "555-12-3456"],
    visible_replacement="[REDACTED]",     # drawn into the raster
    fill="white",                         # or "black"
)

print(f"Redacted {r.occurrencesRedacted} occurrences on {r.pagesAffected} pages")
if r.pagesSkipped:
    print(f"WARNING: {r.pagesSkipped} pages could not be safely burned — "
          "review them before sending the output.")

for occ in r.occurrences:
    print(f"  page {occ.pageNumber}: {occ.term!r} (conf {occ.confidence:.0%})")

Scope: this handles image-based pages. Born-digital text pages are NOT modified — use redact(...) for those.

Pages that are image-based but not a single full-page image are reported in pagesSkipped and left unchanged. A non-zero value means the output is not safe to send as-is.

The output is rewritten via the optimizer to drop the superseded original image objects, so the un-redacted pixels are not recoverable with PDF forensics.

Watermark

Basic diagonal watermark

result = exis_pdfeditor.watermark("input.pdf", "watermarked.pdf", "DRAFT")
print(f"Watermarked {result.pagesWatermarked} of {result.totalPages} pages")

All watermark options

result = exis_pdfeditor.watermark(
    "input.pdf", "watermarked.pdf",
    "CONFIDENTIAL",
    position="across",          # top, bottom, center, across
    font_size=72,
    text_color={"r": 1, "g": 0, "b": 0},
    opacity=0.15,
    page_range=[1, 2],
)

Stamp (PDF overlay / underlay)

Overlay a letterhead on top

result = exis_pdfeditor.stamp(
    "document.pdf", "stamped.pdf",
    "letterhead.pdf",
    mode="overlay",
)
print(f"Stamped {result.pagesStamped} pages")

Underlay a background behind content

result = exis_pdfeditor.stamp(
    "document.pdf", "stamped.pdf",
    "background.pdf",
    mode="underlay",
    opacity=0.5,
    page_range=[1],
)

Bates stamping

Sequential page numbering for legal production and discovery workflows. Each page receives a zero-padded identifier (e.g. ABC000001, ABC000002) rendered in a chosen corner of the visual page. Placement is relative to the page's /Rotate orientation, so mixed-rotation documents render consistently. An XMP audit block recording the range, digit width, and prefix/suffix is written to the document catalog by default.

Defaults (number starts at 1, 6 digits, bottom-right corner)

result = exis_pdfeditor.bates_stamp("input.pdf", "stamped.pdf")
print(f"Stamped pages {result.firstNumber}-{result.lastNumber}")
print(f"Digits used: {result.digitsUsed}")

Prefix, custom position, color, confidentiality label

result = exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    prefix="ABC",
    start_number=1,
    digits=6,                                 # -> "ABC000001"
    position="bottom_right",                  # top_left, top_center, top_right,
                                              # bottom_left, bottom_center, bottom_right
    font_size=10,
    text_color={"r": 0, "g": 0, "b": 0},
    background_color={"r": 1, "g": 1, "b": 1},  # opaque box behind stamp
    margin_inches=0.5,
    confidentiality_label="CONFIDENTIAL",     # stacked above the Bates number
)

Continuous numbering across a batch

Thread lastNumber + 1 into the next call:

next_n = 1
for path in docs:
    r = exis_pdfeditor.bates_stamp(
        path, path + ".stamped.pdf",
        prefix="ABC", start_number=next_n,
    )
    next_n = r.lastNumber + 1

Skip the cover page (legal convention)

The cover is ABC000001 in the production log even if physically unstamped:

exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    prefix="ABC",
    skip_first_page=True,
    counter_advances_on_skipped_pages=True,   # default
)

Stamp only selected pages

exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    page_range=[2, 3, 5],                     # 1-based
)

Signed input

Stamping invalidates signatures — opt in explicitly:

result = exis_pdfeditor.bates_stamp(
    "signed.pdf", "stamped.pdf",
    allow_signed_input=True,                  # result.warnings will record it
)
for w in result.warnings:
    print(f"  warning: {w}")

Suppress the XMP audit block

exis_pdfeditor.bates_stamp(
    "input.pdf", "stamped.pdf",
    write_xmp_metadata=False,
)

All options at a glance

Parameter	Default	Purpose
`prefix` / `suffix`	`""`	Text bracketing the number
`start_number`	`1`	First Bates number (for batch continuation)
`digits`	`6`	Minimum zero-padded width; auto-expands if needed
`position`	`"bottom_right"`	Corner/edge on the visual page
`font_size`	`10.0`	Point size (Helvetica)
`text_color`	black	RGB dict `{"r": 0, "g": 0, "b": 0}`
`background_color`	none	Opaque rectangle behind stamp for legibility
`margin_inches`	`0.5`	Distance from trimmed page edge
`confidentiality_label`	none	e.g. `"CONFIDENTIAL"`
`confidentiality_position`	same corner	Override to place label elsewhere
`confidentiality_font_size`	matches `font_size`	Label point size
`page_range`	all pages	1-based page numbers to stamp
`skip_first_page`	`False`	Skip page 1 (cover sheets)
`counter_advances_on_skipped_pages`	`True`	Advance counter on skipped pages
`allow_signed_input`	`False`	Stamp PDFs with digital signatures
`write_xmp_metadata`	`True`	Write an XMP audit block to the catalog

Result fields: firstNumber, lastNumber, pagesStamped, digitsUsed (equals digits unless auto-expanded), warnings (non-fatal diagnostics: digit expansion, signed-input stamped anyway, etc.).

Optimize

Default optimization (compress + deduplicate)

result = exis_pdfeditor.optimize("large.pdf", "smaller.pdf")
print(f"Reduced {result.originalSize:,} -> {result.optimizedSize:,} bytes "
      f"({result.reductionPercent:.1f}% smaller)")
print(f"  Streams compressed: {result.streamsCompressed}")
print(f"  Duplicates removed: {result.duplicatesRemoved}")

With image downsampling

result = exis_pdfeditor.optimize(
    "large.pdf", "smaller.pdf",
    downsample_images=True,
    max_image_dpi=150,
    remove_metadata=True,
)
print(f"Images downsampled: {result.imagesDownsampled}")

Encrypt & decrypt

Encrypt with a password

exis_pdfeditor.encrypt(
    "document.pdf", "protected.pdf",
    user_password="openme",
    owner_password="secret",
    permissions=["Print", "CopyText"],
)

Available permissions: Print, ModifyContents, CopyText, AddAnnotations, FillForms, PrintHighQuality, All.

Decrypt

exis_pdfeditor.decrypt("protected.pdf", "unlocked.pdf", password="openme")

Page editing

Rotate pages

result = exis_pdfeditor.rotate("input.pdf", "rotated.pdf", angle=90)
print(f"Rotated {result.pagesModified} pages")

# Rotate only specific pages
result = exis_pdfeditor.rotate("input.pdf", "rotated.pdf", angle=180, pages=[2, 4])

Crop pages

result = exis_pdfeditor.crop(
    "input.pdf", "cropped.pdf",
    rect={"x": 50, "y": 50, "width": 500, "height": 700},
)

Reorder pages

exis_pdfeditor.reorder("input.pdf", "reordered.pdf", order=[3, 1, 2])

Delete pages

exis_pdfeditor.delete_pages("input.pdf", "trimmed.pdf", pages=[2, 4])

Insert blank pages

Each insertion specifies a 0-based "after page" anchor (0 = before page 1) and an optional page size. Supported sizes: letter (default), legal, a4, a3, a5, tabloid.

exis_pdfeditor.insert_blank_pages(
    "input.pdf", "with_blanks.pdf",
    insertions=[
        {"afterPage": 0, "size": "a4"},       # before page 1
        {"afterPage": 3, "size": "letter"},   # after page 3
    ],
)

Images

Find all images

result = exis_pdfeditor.find_images("document.pdf")
print(f"Found {result.totalImages} images across {result.pagesSearched} pages")

for img in result.images:
    print(f"  Image {img.index}: {img.pixelWidth}x{img.pixelHeight} "
          f"{img.colorSpace} {img.format}")
    print(f"    Pages: {img.pageNumbers}")

Find images and save to disk

result = exis_pdfeditor.find_images("document.pdf", output_dir="extracted_images/")

Replace all images

result = exis_pdfeditor.replace_image(
    "document.pdf", "replaced.pdf",
    "new_logo.png",
)
print(f"Replaced {result.imagesReplaced} of {result.imagesFound} images")

Replace specific images by index or page

result = exis_pdfeditor.replace_image(
    "document.pdf", "replaced.pdf",
    "new_logo.jpg",
    image_indices=[0, 2],
    page_range=[1],
    scale_mode="scale_to_fit",  # match_original_size, preserve_aspect_ratio, scale_to_fit
)

Digital signatures

Sign a PDF

exis_pdfeditor.sign(
    "document.pdf", "signed.pdf",
    cert_path="certificate.pfx",
    cert_password="certpass",
    reason="Approved",
    location="New York, NY",
    signer_name="John Doe",
)

Sign with a visible signature

exis_pdfeditor.sign(
    "document.pdf", "signed.pdf",
    cert_path="certificate.pfx",
    cert_password="certpass",
    visible=True,
    page=1,
    rect={"x": 50, "y": 50, "width": 200, "height": 60},
    reason="Reviewed and approved",
)

Verify signatures

info = exis_pdfeditor.verify("signed.pdf")
print(f"Signed: {info.isSigned}")
print(f"Signer: {info.signerName}")
print(f"Valid:  {info.isValid}")
print(f"Reason: {info.reason}")
print(f"Date:   {info.signDate}")

# Verify all signatures in a multi-signed document
signatures = exis_pdfeditor.verify("multi_signed.pdf", all_signatures=True)
for sig in signatures:
    print(f"  {sig.signerName}: valid={sig.isValid}")

PDF/A compliance

Validate

result = exis_pdfeditor.pdfa_validate("document.pdf", level="2b")
print(f"Compliant: {result.isCompliant}")

if not result.isCompliant:
    for v in result.violations:
        print(f"  [{v.code}] {v.message} (auto-fixable: {v.canAutoFix})")

Convert to PDF/A

exis_pdfeditor.pdfa_convert("document.pdf", "archive.pdf", level="2b")

Supported levels: 1b, 2b, 2u, 3b, 3u.

XMP + `/Info` metadata

PDFs carry document metadata in two places:

XMP — the modern RDF/XML packet referenced from the Catalog /Metadata entry.
/Info — the legacy trailer dictionary (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate, plus custom keys).

get_metadata() is free and works on encrypted PDFs (XMP only). All mutation calls (set_xmp, set_info, remove_xmp, remove_info, remove_metadata) require a license and reject encrypted input — decrypt first with decrypt().

Read metadata

meta = exis_pdfeditor.get_metadata("document.pdf")

if meta.hasXmp:
    print(f"XMP packet: {meta.xmpByteSize} bytes")
    print(meta.xmpXml)          # Full <?xpacket ... ?> payload as UTF-8 text

if meta.hasInfo:
    info = meta.info
    print(f"Title:    {info.title}")
    print(f"Author:   {info.author}")
    print(f"Subject:  {info.subject}")
    print(f"Keywords: {info.keywords}")
    print(f"Creator:  {info.creator}")
    print(f"Producer: {info.producer}")
    print(f"Created:  {info.creationDate}")        # ISO 8601 string or None
    print(f"Modified: {info.modificationDate}")    # same

    # Non-standard /Info keys (e.g. "Company", custom producer fields)
    for key, value in vars(info.custom).items():
        print(f"  [custom] {key} = {value}")

Replace the XMP packet

Pass a full <?xpacket ...?>-wrapped RDF/XML document as a string:

xmp = """<?xpacket begin=""?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/">
      <dc:title><rdf:Alt><rdf:li xml:lang="x-default">Quarterly Report</rdf:li></rdf:Alt></dc:title>
      <dc:creator><rdf:Seq><rdf:li>Finance Team</rdf:li></rdf:Seq></dc:creator>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>"""

exis_pdfeditor.set_xmp("input.pdf", "output.pdf", xmp)

Passing an empty string writes an empty packet (still present, no data). To drop the /Metadata reference entirely, use remove_xmp().

Replace the `/Info` dictionary

The PDF must already have an /Info reference in its trailer; otherwise this raises PdfEditorError. Fields you omit are written as missing (not empty strings) so "not set" semantics are preserved.

from datetime import datetime, timezone

exis_pdfeditor.set_info(
    "input.pdf", "output.pdf",
    info={
        "title":    "Quarterly Report",
        "author":   "Finance Team",
        "subject":  "Q1 2026 earnings summary",
        "keywords": "earnings, Q1, 2026",
        "creator":  "exis-pdfeditor",
        "producer": "Exis.PdfEditor 3.6",
        "creationDate":     datetime(2026, 4, 1, 9, 0, tzinfo=timezone.utc),
        "modificationDate": datetime.now(timezone.utc),
        # Arbitrary non-standard /Info keys
        "custom": {
            "Company":  "Exis LLC",
            "Revision": "v3.6.4",
        },
    },
)

Dates can be provided as datetime / date objects or ISO 8601 strings (e.g. "2026-04-20T09:00:00Z"). Snake_case aliases creation_date and modification_date are also accepted.

Remove metadata (wipe for privacy)

# Drop the XMP packet only
exis_pdfeditor.remove_xmp("input.pdf", "stripped-xmp.pdf")

# Empty the /Info dict (trailer reference kept, contents cleared)
exis_pdfeditor.remove_info("input.pdf", "stripped-info.pdf")

# Drop both in one incremental update (smaller output than calling both)
exis_pdfeditor.remove_metadata("input.pdf", "stripped.pdf")

Diagnostic structure dump

When a PDF fails to process and you can't share the file, dump_structure() produces a self-contained report you can paste into a bug report. It walks every object in the file, tallies filter chains and font subtypes, lists encryption details, and records any streams that fail to decode — without needing the original file. No license required.

Human-readable report

dump = exis_pdfeditor.dump_structure("problem.pdf")

# The dump has a built-in text report — just print it
# (or paste it into a support email / bug report)
print(dump)

Individual fields

dump = exis_pdfeditor.dump_structure("problem.pdf")

print(f"PDF version:    {dump.version}")
print(f"Pages:          {dump.pageCount}")
print(f"Total objects:  {dump.totalObjects}")
print(f"Stream objects: {dump.streamObjectCount}")
print(f"Xref entries:   {dump.xrefEntryCount}")
print(f"Xref streams:   {dump.usesXrefStreams}")
print(f"Producer:       {dump.producer}")
print(f"Creator:        {dump.creator}")

# Encryption details
print(f"Encrypted:      {dump.isEncrypted}")
if dump.isEncrypted:
    print(f"  Version:      {dump.encryptionVersion}")
    print(f"  Revision:     {dump.encryptionRevision}")
    print(f"  Key length:   {dump.encryptionKeyLengthBits} bits")

# Catalog flags
print(f"AcroForm:       {dump.hasAcroForm}")
print(f"Signed:         {dump.hasDigitalSignature}")
print(f"Embedded files: {dump.hasEmbeddedFiles}")

# Filter chains — which compression methods are used and how often
for chain in dump.filterChains:
    print(f"  {chain}: {dump.filterChains[chain]} streams")

# Font subtypes
for subtype in dump.fontSubtypes:
    print(f"  /{subtype}: {dump.fontSubtypes[subtype]}")

# Streams that failed to decode (capped at 50)
for bad in dump.unsupportedStreams:
    print(f"  obj {bad.objectNumber} [{bad.filterChain}]: {bad.error}")

# Free-form notes about parse anomalies
for note in dump.notes:
    print(f"  {note}")

Licensing

Free: inspect(), pdfa_validate(), dump_structure(), and get_metadata() work without a license.
Trial: Call exis_pdfeditor.initialize() with no key for a 14-day full-feature trial.
Licensed: Pass your key to exis_pdfeditor.initialize("XXXX-XXXX-XXXX-XXXX") or set the EXIS_PDF_LICENSE_KEY environment variable.
Evaluation: After trial expiry, all features work on documents up to 3 pages.

Purchase a license at pdfbatcheditor.com/developers.

Requirements

Python 3.9+
No external dependencies — the native binary is bundled in the wheel.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.8.1

May 27, 2026

This version

3.8.0

May 27, 2026

3.7.8

May 14, 2026

3.7.0

Apr 21, 2026

3.6.4

Apr 21, 2026

3.6.2

Apr 20, 2026

3.4.1

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

exis_pdfeditor-3.8.0-py3-none-win_amd64.whl (48.8 MB view details)

Uploaded May 27, 2026 Python 3Windows x86-64

exis_pdfeditor-3.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded May 27, 2026 Python 3manylinux: glibc 2.17+ x86-64

exis_pdfeditor-3.8.0-py3-none-macosx_11_0_arm64.whl (4.7 MB view details)

Uploaded May 27, 2026 Python 3macOS 11.0+ ARM64

File details

Details for the file exis_pdfeditor-3.8.0-py3-none-win_amd64.whl.

File metadata

Download URL: exis_pdfeditor-3.8.0-py3-none-win_amd64.whl
Upload date: May 27, 2026
Size: 48.8 MB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for exis_pdfeditor-3.8.0-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`eaf9561cef09e8aeae7d295ecde96cbfc514847b661264079de6d53219749293`
MD5	`e34e08a5809da565845a05624d97536c`
BLAKE2b-256	`9796155c3328101322d91d993e6a12a6d8d97899792807b9623b348a95d82187`

See more details on using hashes here.

File details

Details for the file exis_pdfeditor-3.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: exis_pdfeditor-3.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 27, 2026
Size: 4.8 MB
Tags: Python 3, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for exis_pdfeditor-3.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`3453377bf2a6eefe8e2d5f2c7e25f72653c3c758fbb1b2d37582c6496c52f3bc`
MD5	`0ca3c005cfef4d8021b8dcfb673a8216`
BLAKE2b-256	`75ee5985592478b85b37b34cb22cdc09eea31b2a0fb2e5f5c50296b44e484d10`

See more details on using hashes here.

File details

Details for the file exis_pdfeditor-3.8.0-py3-none-macosx_11_0_arm64.whl.

File metadata

Download URL: exis_pdfeditor-3.8.0-py3-none-macosx_11_0_arm64.whl
Upload date: May 27, 2026
Size: 4.7 MB
Tags: Python 3, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for exis_pdfeditor-3.8.0-py3-none-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`3425ee0b65683ac522a31a8f8f88336a963da2908c87e44240d4620a01497d24`
MD5	`2818ebfa197aff2df05b658ac332e62f`
BLAKE2b-256	`19db710900129f2394ed1f836471979afba97622658ec36e42489f456d3b90c4`

See more details on using hashes here.

exis-pdfeditor 3.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Exis.PdfEditor for Python

Installation

Quick start

Features

Platform support

Inspect

Text extraction

Plain text

Specific pages only

Structured text (with positions and font data)

Find & replace

Single pair

Multiple pairs

Regex

All options

Merge

Split

Extract pages

Form fields

List all fields

List with duplicate-widget split (two-up / carbon copy)

Fill a form

Force text alignment (default honors each field's PDF /Q)

Fill duplicate-widget fields (different value per widget; auto-flattens)

Fill and flatten (make fields non-editable)

Redaction

Redact text

Redact a specific area on a page

Combined text + area redaction

Page classification (pre-OCR triage)

Searchable PDFs (OCR) — Windows only

Additional languages

Options

De-identify scanned pages — Windows only

Watermark

Basic diagonal watermark

All watermark options

Stamp (PDF overlay / underlay)

Overlay a letterhead on top

Underlay a background behind content

Bates stamping

Defaults (number starts at 1, 6 digits, bottom-right corner)

Prefix, custom position, color, confidentiality label

Continuous numbering across a batch

Skip the cover page (legal convention)

Stamp only selected pages

Signed input

Suppress the XMP audit block

All options at a glance

Optimize

Default optimization (compress + deduplicate)

With image downsampling

Encrypt & decrypt

Encrypt with a password

Decrypt

Page editing

Rotate pages

Crop pages

Reorder pages

Delete pages

Insert blank pages

Images

Find all images

Find images and save to disk

Replace all images

Replace specific images by index or page

Digital signatures

Sign a PDF

Sign with a visible signature

Verify signatures

PDF/A compliance

Force text alignment (default honors each field's PDF `/Q`)

XMP + `/Info` metadata

Replace the `/Info` dictionary