A library for manipulating PDF content streams.

These details have not been verified by PyPI

Project links

Project description

pdfbeaver

A context-aware PDF content stream editor.

Coverage Tests Python

beaver: an animal which manipulates water streams.

pdfbeaver: a library which manipulates PDF content streams.

pdfbeaver bridges the gap between reading PDFs (calculating text positions, tracking graphics state) and writing PDFs (injecting operators, removing content). Using pdfbeaver, you can easily write pdf content stream filters which are aware of "where you are on the page" at any given moment inside the content stream.

Example applications:

change colors of PDF text and vector graphics
redact PDF text content without disrupting the rest of the text
optimize vector paths in PDF graphics
replace fonts in a PDF file

It is built on top of pikepdf (and qpdf) for PDF writing/manipulation and pdfminer.six for stream parsing and state tracking.

🚀 Key Features

User-friendly API: register stream editing methods using decorators.
Context-Aware Editing: Modify operators based on the current graphics state (Font, Color, Matrix, CTM).
Safe Recursion: Automatically traverses and modifies Form XObjects, ensuring nested content is treated exactly like page content.
State Tracking: Tracks the cursor position ($x, y$) and transformation matrices ($Tm, CTM$) as you parse.
Peephole Optimization: Includes passes to remove dead stores (unused graphics state updates) to keep output files small.

📦 Installation

pip install pdfbeaver

(Note: Requires pikepdf and pdfminer.six)

⚡ Quick Start

1. Simple Operator Replacement

Change all text color to Red.

import pikepdf
import pdfbeaver

pdf = pikepdf.open("input.pdf")

@pdfbeaver.register("Tj", "TJ", "'", '"')
def make_text_red(op, operands, raw_bytes):
    # Return a sequence of instructions:
    # 1. Set RGB color to Red (1, 0, 0)
    # 2. Draw the original text
    return [
        ([1, 0, 0], "rg"),  # Non-stroking red
        ([1, 0, 0], "RG"),  # Stroking red
        raw_bytes           # Original text op
    ]

pdfbeaver.process(pdf)
pdf.save("output_red.pdf")

2. Context-Aware Modification (Redaction)

Delete text only if it appears in the top-left quadrant of the page.

@pdfbeaver.register("Tj", "TJ")
def delete_top_left(context):
    x, y = pdfbeaver.extract_text_position(context.pre_input)[:2]
    if x < 300 and y > 400:
        return None
    return pdfbeaver.UNCHANGED # Pass through unchanged

Flexible Signatures

The @register decorator inspects your function signature. You can include any of the following arguments in any order:

operands (or args): List of arguments for the operator.
operator (or op): The operator string (e.g. "Tj").
raw_bytes: The original binary data for this instruction.
context: The StreamContext object.
pdf: The pikepdf.Pdf document.
page: The pikepdf.Page object.

🏗 Architecture

pdfbeaver solves the problem of mapping input geometry to output streams incrementally, allowing state to be interrogated mid-stream.

graph LR
    A[Input Stream] --> B[StreamStateIterator];
    B --> C{State Tracker};
    C --> D[Handler Registry];
    D --> E[Stream Editor];
    E --> F[Optimizer];
    F --> G[Output Stream];

StreamStateIterator: Wraps pdfminer to interpret the stream byte-by-byte, updating a virtual graphics state (Matrices, Fonts).
HandlerRegistry: Intercepts specific operators defined by the user.
StreamEditor: Recompiles the stream. It injects modified operators or passes original raw bytes for maximum speed and fidelity.
Optimizer: Runs a post-processing pass to clean up redundant operators (e.g., 1 0 0 rg followed immediately by 0 1 0 rg).

📚 Advanced Usage

The `StreamContext`

Every handler receives a context object containing:

context.tracker: The active state tracker (access gstate, textstate, get_current_user_pos()).
context.page: The pikepdf.Page object currently being processed.
context.container: The specific object being processed (could be a Page or a Form XObject).

See docs/ for documentation. (Hopefully this will appear on readthedocs some day.)

📄 License

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Dec 9, 2025

0.1.0

Dec 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfbeaver-0.1.1.tar.gz (51.7 kB view details)

Uploaded Dec 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdfbeaver-0.1.1-py3-none-any.whl (31.1 kB view details)

Uploaded Dec 9, 2025 Python 3

File details

Details for the file pdfbeaver-0.1.1.tar.gz.

File metadata

Download URL: pdfbeaver-0.1.1.tar.gz
Upload date: Dec 9, 2025
Size: 51.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdfbeaver-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`bf42fb25eddd40e01d81c3f1bab5b22ecf23929c5887c5156796b7e2585610dc`
MD5	`36e57e1d5c86829cacfad0be635a3092`
BLAKE2b-256	`a71e0b383e4cc92c2671456f8de06423fec6a38c2d651df2d94d6030e7f947a9`

See more details on using hashes here.

File details

Details for the file pdfbeaver-0.1.1-py3-none-any.whl.

File metadata

Download URL: pdfbeaver-0.1.1-py3-none-any.whl
Upload date: Dec 9, 2025
Size: 31.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdfbeaver-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddd2a63a6389e02429230ab94d28832916db559b3ebc2f0038e93b52d792d984`
MD5	`eedee95c1f67b29dd39d5a49d6d59df0`
BLAKE2b-256	`270675e7eaadc14c93e07cfa37954622b878f7ede0ebb6804c0e3a9ee6daaafe`

See more details on using hashes here.

pdfbeaver 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pdfbeaver

🚀 Key Features

📦 Installation

⚡ Quick Start

1. Simple Operator Replacement

2. Context-Aware Modification (Redaction)

Flexible Signatures

🏗 Architecture

📚 Advanced Usage

The `StreamContext`

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

pdfbeaver 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pdfbeaver

🚀 Key Features

📦 Installation

⚡ Quick Start

1. Simple Operator Replacement

2. Context-Aware Modification (Redaction)

Flexible Signatures

🏗 Architecture

📚 Advanced Usage

The StreamContext

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `StreamContext`