Skip to main content

A capable CLI tool for PDF manipulation inspired by pdftk.

Project description

pdftl

PyPI CI codecov Documentation Status PyPI - Python Version Static Badge

pdftl ("PDF tackle") is a CLI tool for PDF manipulation written in Python. It is intended to be a command-line compatible extension of the venerable pdftk.

Leveraging the power of pikepdf (qpdf) and other modern libraries, it offers advanced capabilities like cropping, chopping, regex text replacement, adding text and arbitrary content stream injection.

Quick start

pipx install pdftl[full]

# merge, crop to letter paper, rotate last page and output with encryption with one command
pdftl A=a.pdf B=b.pdf cat A1-5 B2-end \
    --- crop '4-8,12(letter)' \
    --- rotate endright \
    output out.pdf owner_pw foo user_pw bar encrypt_aes256

Key features and pdftk compatibility

  • Familiar syntax: Command-line compatible with pdftk. Verified against Mike Haertl's php-pdftk test suite and the pdftk-java test suite logic, so s/pdftk/pdftl/ should result in working scripts.
  • Pipelining: Chain multiple operations in a single command using ---.
  • Performant: pdftl seems faster than pdftk-java for many operations (based on informal benchmarks). Reason: pdftl mostly drives pikepdf which drives qpdf, a fast C++ library.
  • Extra/enhanced operations and features such as zooming pages, smart merging preserving links and outlines, cropping/chopping up pages, text extraction, optimizing images.
  • Modern security: Supports AES-256 encryption and modern permission flags out of the box.
  • Content editing: Find & replace text via regular expressions, inject raw PDF operators, or overlay dynamic text.

pdftl maintains command-line compatibility with pdftk while introducing features required for modern PDF workflows.

Feature pdftk (Legacy) pdftl (Modern)
Pipelining ❌ (Requires temp files) Native (Chain ops with ---)
Encryption ⚠️ (Obsolete RC4) AES-256 Support
Syntax Standard Compatible Extension
Page Geometry Crop to fit, Zoom, & Chop
Pipelined Logic Rotate + Stamp in one command
Plugins Custom operations/mutation scripts written in Python
Installation Often complex binary Simple pipx install pdftl
Performance Variable Powered by pikepdf/qpdf
Link Integrity ⚠️ Often breaks TOC/Links Preserves internal cross-refs
Shell Completion bash, zsh and powershell
Help ⚠️ Basic (manpage) Self-documenting: pdftl help <operation/option/topic/tag>

Installation

Install pipx, and then:

pipx install pdftl[full]

A simple pip install pdftl[full] install is also supported.

Note: The [full] install includes ocrmypdf for image optimization, reportlab for text generation, pypdfium2 for text extraction and robust flattening, and pyHanko for cryptographic signature functionality. Omit [full] to omit those features and dependencies.

Key features

📄 Standard operations

✂️ Geometry & splitting

  • Rotate: rotate pages (absolute or relative).
  • Crop: crop to margins or standard paper sizes (e.g., "A4").
  • Chop: chop pages into grids or rows (e.g., split a scanned spread into two pages).
  • Shift, scale and spin page content inside the page boundaries using place.

📝 Forms & annotations

🔐 Security

🛠️ Advanced

  • Text replacement: replace text in content streams using regular expressions (experimental).
  • Code injection: inject raw PDF operators at the head/tail of content streams.
  • Optimization: optimize_images (smart compression via OCRmyPDF).
  • Dynamic text: add_text supports Bates stamping and can add page numbers, filenames, timestamps, etc.
  • Cleanup: normalize content streams, linearize for web viewing.
  • Plugins: write your own custom operation in Python, save to ~/.config/pdftl/operations (*nix) or %APPDATA%\pdftl\config (Windows) and you can use it in pdftl, just like the built-in operations. And you can mutate_content using simple Python scripts.

Examples

For more than 100 other examples: pdftl help examples.

Concatenation

# Merge two files
pdftl in1.pdf in2.pdf cat output combined.pdf

# Now with in2.pdf zoomed in
pdftl A=in1.pdf B=in2.pdf cat A Bz1 output combined2.pdf

Geometry

# Take pages 1-5, rotate them 90 degrees East, and crop to A4
pdftl in.pdf cat 1-5east --- crop "(a4)" output out.pdf

Pipelining

You can chain operations without intermediate files using ---:

# Burst a file, but rotate and stamp every page first
pdftl in.pdf rotate south \
  --- stamp watermark.pdf \
  --- burst output page_%04d.pdf

Forms and metadata

# Fill a form and flatten it (make it non-editable)
pdftl form.pdf fill_form data.fdf flatten output signed.pdf

Modify annotations

# Change all Highlight annotations on odd pages to Red
pdftl docs.pdf modify_annots "odd/Highlight(C=[1 0 0])" output red_notes.pdf

Modify content

# Add a watermark, the pdftk way
pdftl in.pdf stamp watermark.pdf output marked1.pdf
# Add an obnoxious semi-transparent red watermark on odd pages only
pdftl in.pdf add_text 'odd/YOUR AD HERE/(position=mid-center, font=Helvetica-Bold, size=72, rotate=45, color=1 0 0 0.5)' output with_ads.pdf
# Add Bates numbering starting at 000121
# Result: DEF-000121, DEF-000122, ...
pdftl in.pdf \
  add_text "/DEF-{page+120:06d}/(position=bottom-center, offset-y=10)" \
  output bates.pdf
# Content stream replacment with regular expressions (YMMV)
# Change black to red
pdftl in.pdf replace '/0 0 0 (RG|rg)/1 0 0 \1/' output redder.pdf

Python API

While pdftl is primarily a CLI tool, it also exposes a robust Python API for integrating PDF workflows into your scripts. It supports both a Functional interface (similar to the CLI) and a Fluent interface (for method chaining).

from pdftl import pipeline

# Chain operations fluently without saving intermediate files
(
    pipeline("input.pdf")
    .rotate("right")
    .stamp("watermark.pdf")
    .save("output.pdf")
)

See the API Tutorial for more details.

Operations and options

Operation Description
add_text Add user-specified text strings to PDF pages
attach_files Attach files to the output PDF
background Use a 1-page PDF as the background for each page
burst Split a single PDF into individual page files
cat Concatenate pages from input PDFs into a new PDF
chop Chop pages into multiple smaller pieces
crop Crop pages
delete Delete pages from an input PDF
delete_annots Delete annotation info
dump_annots Dump annotation info
dump_data Metadata, page and bookmark info (XML-escaped)
dump_data_annots Dump annotation info in pdftk style
dump_data_fields Print PDF form field data with XML-style escaping
dump_data_fields_utf8 Print PDF form field data in UTF-8
dump_data_utf8 Metadata, page and bookmark info (in UTF-8)
dump_dests Print PDF named destinations data to the console
dump_files List file attachments
dump_layers Dump layer info (JSON)
dump_signatures List and validate digital signatures
dump_text Print PDF text data to the console or a file
fill_form Fill a PDF form
filter Do nothing (the default if <operation> is absent)
generate_fdf Generate an FDF file containing PDF form data
inject Inject code at start or end of page content streams
insert Insert blank pages
modify_annots Modify properties of existing annotations
move Move pages to a new location
multibackground Use multiple pages as backgrounds
multistamp Stamp multiple pages onto an input PDF
mutate_content Mutate page content streams using a user-supplied Python script
normalize Reformat page content streams
optimize_images Optimize images
place Shift, scale, and spin page content
replace Regex replacement on page content streams
render Render PDF pages as images
rotate Rotate pages in a PDF
shuffle Interleave pages from multiple input PDFs
stamp Stamp a 1-page PDF onto each page of an input PDF
unpack_files Unpack file attachments
update_info Update PDF metadata from dump_data instructions
update_info_utf8 Update PDF metadata from dump_data_utf8 instructions
Option Description
allow <perm> Specify permissions for encrypted files
compress Compress output file streams (default)
drop_info Discard document-level info metadata
drop_xfa Discard form XFA data if present
drop_xmp Discard document-level XMP metadata
encrypt_128bit Use 128 bit encryption (obsolete, maybe insecure)
encrypt_40bit Use 40 bit encryption (obsolete, highly insecure)
encrypt_aes128 Use 128 bit AES encryption (maybe obsolete)
encrypt_aes256 Use 256 bit AES encryption
flatten Flatten all annotations
keep_final_id Copy final input PDF's ID metadata to output
keep_first_id Copy first input PDF's ID metadata to output
linearize Linearize output file(s)
no_encrypt_metadata Leave metadata unencrypted
need_appearances Set a form rendering flag in the output PDF
output <file> The output file path, or a template for burst
owner_pw <pw> Set owner password and encrypt output
replacement_font <file> Replace the font used for all form fields with a TTF file
sign_cert <file> Path to certificate PEM
sign_field <name> Signature field name (default: Signature1)
sign_key <file> Path to private key PEM
sign_pass_env <var> Environment variable with sign_cert passphrase
sign_pass_prompt Prompt for sign_cert passphrase
uncompress Disable compression of output file streams
user_pw <pw> Set user password and encrypt output
verbose Turn on verbose output

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdftl-0.11.2.tar.gz (502.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdftl-0.11.2-py3-none-any.whl (260.5 kB view details)

Uploaded Python 3

File details

Details for the file pdftl-0.11.2.tar.gz.

File metadata

  • Download URL: pdftl-0.11.2.tar.gz
  • Upload date:
  • Size: 502.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdftl-0.11.2.tar.gz
Algorithm Hash digest
SHA256 a45318a30b989a307947752fe5232f66fa3a934152e40975cfd23139d3ddcbdb
MD5 815fb1428d9cd62799c0538abc704087
BLAKE2b-256 da392a658b2491fbdbfe0602d79643469e897fc09929cf3150c133cdcdea00e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdftl-0.11.2.tar.gz:

Publisher: publish.yml on pdftl/pdftl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pdftl-0.11.2-py3-none-any.whl.

File metadata

  • Download URL: pdftl-0.11.2-py3-none-any.whl
  • Upload date:
  • Size: 260.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdftl-0.11.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c776745fa108226070278cd64501a590ae43594872eea8a8e5271681745ce023
MD5 124b285f8415a5299bac2e089ea50624
BLAKE2b-256 8c99067550a0a08692d48e4525bb63c07de88ff3f3f220fb69231ecaa8749df9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdftl-0.11.2-py3-none-any.whl:

Publisher: publish.yml on pdftl/pdftl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page