Skip to main content

No project description provided

Project description

PdfRedact

PyPI - Version PyPI - Python Version

A robust PDF redaction library that securely removes sensitive information while preserving document integrity.

Features at a Glance

🔒 Comprehensive Redaction

  • Handles both visual content and searchable text layers
  • True removal of sensitive information

⚙️ Flexible Configuration

  • Adjustable image quality (DPI)
  • Configurable text handling (invisible or removed)
  • Cross-platform support (Windows & Unix)

🛠️ Professional-Grade Tools

  • High-resolution image processing
  • Precise coordinate-based redaction
  • Document searchability preservation

Installation

pip install pdfredact

Dependencies

  • Python 3.10+
  • Poppler (PDF to image conversion)
  • Pillow (Image processing)
  • ReportLab (PDF generation)

Windows Setup

Windows users must install Poppler and either:

  • Set the POPPLER_PATH environment variable
  • Provide the path explicitly in function calls

How It Works

The redaction process follows a secure two-step approach:

  1. Image Layer Redaction 📸

    • Converts PDF pages to high-resolution images
    • Applies black rectangles to specified areas
    • Maintains quality through configurable DPI
  2. Text Layer Processing 📝

    • Generates a new searchable text layer
    • Removes or obscures text in redacted regions
    • Preserves document searchability

Usage Example

from pdfredact import redact_pdf_to_images, build_text_redacted_pdf

# Define redaction areas using page coordinates
redactions = [{
    "bounds": {
        "left": 100,
        "right": 200,
        "top": 50,
        "bottom": 75
    },
    "tokensJsons": [...],  # Token information
    "rawText": "sensitive text"
}]

# Step 1: Generate redacted images
redacted_images = redact_pdf_to_images(
    pdf_bytes=pdf_content,
    pawls_pages=page_data,
    page_annotations=[redactions],
    dpi=300
)

# Step 2: Create final PDF with redacted text layer
build_text_redacted_pdf(
    output_pdf="redacted_document.pdf",
    redacted_images=redacted_images,
    pawls_pages=page_data,
    page_redactions=[redactions],
    dpi=300,
    hide_text=True  # Text remains copyable but invisible
)

Best Practices

📋 Quality Assurance

  • Use 300+ DPI for optimal output quality
  • Verify redactions through:
    • Visual inspection
    • Copy/paste testing
    • Text extraction tool verification

⚠️ Known Limitations

  • Processing time scales with DPI and document size
  • Large documents require substantial memory
  • Image-based PDFs need special handling

Security Advisory

⚠️ Important: Always verify redacted documents thoroughly. The PDF format is complex and support can vary. Multiple verification methods are recommended for sensitive materials.

Disclaimer: We are not liable for consequences arising from improperly redacted PDFs.

License

pdfredact is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfredact-0.0.3.tar.gz (401.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfredact-0.0.3-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file pdfredact-0.0.3.tar.gz.

File metadata

  • Download URL: pdfredact-0.0.3.tar.gz
  • Upload date:
  • Size: 401.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for pdfredact-0.0.3.tar.gz
Algorithm Hash digest
SHA256 09d5b84ab1d4e659213d47f0bd49a5b02f9eb2c4618ce18a67aa088da557e610
MD5 0c71c1d6a611778f913ce63629ad82ba
BLAKE2b-256 c29f5b589a65031e6c5fd3bd8e7f7a0238cd4fc9e1111f552cb00260a382849d

See more details on using hashes here.

File details

Details for the file pdfredact-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: pdfredact-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for pdfredact-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c6acb85e0b077d9b61ec54c0babd30ff1506d55fb8611df45e97e175695a58d1
MD5 4ceccd88232e5fe165add5c7b060fc23
BLAKE2b-256 ccd9d089ebb542815b2a35fe7f228d5a84d80a2922ac65ff7d6f037d02d5cec1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page