No project description provided

These details have not been verified by PyPI

Project links

Project description

PdfRedact

A robust PDF redaction library that securely removes sensitive information while preserving document integrity.

Features at a Glance

🔒 Comprehensive Redaction

Handles both visual content and searchable text layers
True removal of sensitive information

⚙️ Flexible Configuration

Adjustable image quality (DPI)
Configurable text handling (invisible or removed)
Cross-platform support (Windows & Unix)

🛠️ Professional-Grade Tools

High-resolution image processing
Precise coordinate-based redaction
Document searchability preservation

Installation

pip install pdfredact

Dependencies

Python 3.10+
Poppler (PDF to image conversion)
Pillow (Image processing)
ReportLab (PDF generation)

Windows Setup

Windows users must install Poppler and either:

Set the POPPLER_PATH environment variable
Provide the path explicitly in function calls

How It Works

The redaction process follows a secure two-step approach:

Image Layer Redaction 📸
- Converts PDF pages to high-resolution images
- Applies black rectangles to specified areas
- Maintains quality through configurable DPI
Text Layer Processing 📝
- Generates a new searchable text layer
- Removes or obscures text in redacted regions
- Preserves document searchability

Usage Example

from pdfredact import redact_pdf_to_images, build_text_redacted_pdf

# Define redaction areas using page coordinates
redactions = [{
    "bounds": {
        "left": 100,
        "right": 200,
        "top": 50,
        "bottom": 75
    },
    "tokensJsons": [...],  # Token information
    "rawText": "sensitive text"
}]

# Step 1: Generate redacted images
redacted_images = redact_pdf_to_images(
    pdf_bytes=pdf_content,
    pawls_pages=page_data,
    page_annotations=[redactions],
    dpi=300
)

# Step 2: Create final PDF with redacted text layer
build_text_redacted_pdf(
    output_pdf="redacted_document.pdf",
    redacted_images=redacted_images,
    pawls_pages=page_data,
    page_redactions=[redactions],
    dpi=300,
    hide_text=True  # Text remains copyable but invisible
)

Best Practices

📋 Quality Assurance

Use 300+ DPI for optimal output quality
Verify redactions through:
- Visual inspection
- Copy/paste testing
- Text extraction tool verification

⚠️ Known Limitations

Processing time scales with DPI and document size
Large documents require substantial memory
Image-based PDFs need special handling

Security Advisory

⚠️ Important: Always verify redacted documents thoroughly. The PDF format is complex and support can vary. Multiple verification methods are recommended for sensitive materials.

Disclaimer: We are not liable for consequences arising from improperly redacted PDFs.

License

pdfredact is distributed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.6

Feb 13, 2025

0.0.5

Jan 19, 2025

This version

0.0.4

Jan 19, 2025

0.0.3

Jan 19, 2025

0.0.2

Jan 11, 2025

0.0.1

Jan 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfredact-0.0.4.tar.gz (401.3 kB view details)

Uploaded Jan 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdfredact-0.0.4-py3-none-any.whl (6.7 kB view details)

Uploaded Jan 19, 2025 Python 3

File details

Details for the file pdfredact-0.0.4.tar.gz.

File metadata

Download URL: pdfredact-0.0.4.tar.gz
Upload date: Jan 19, 2025
Size: 401.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.27.0

File hashes

Hashes for pdfredact-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`1c9aa01759fb0381b5d94b610a597a10f3182258ef3bb5feafb1d2a039aeaa30`
MD5	`16bbdc61fba3a51b1b1b02b85ef863c2`
BLAKE2b-256	`635cd26fdadf5429dead44ed3764166a3fef235ffb49c3e59e7c09ec3455d471`

See more details on using hashes here.

File details

Details for the file pdfredact-0.0.4-py3-none-any.whl.

File metadata

Download URL: pdfredact-0.0.4-py3-none-any.whl
Upload date: Jan 19, 2025
Size: 6.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.27.0

File hashes

Hashes for pdfredact-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`06f4b0734480784bd8d6a81171fb5dfc7909bceaeb9956b85bbd56d0edd90c68`
MD5	`7cd2ff31e0bdb46aaa696cb11d341f89`
BLAKE2b-256	`ead07c245f211aecdab412cd0127b7a30986eced163b3ce0ef1729bee541b080`

See more details on using hashes here.

pdfredact 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PdfRedact

Features at a Glance

Installation

Dependencies

Windows Setup

How It Works

Usage Example

Best Practices

Security Advisory

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes