No project description provided
Project description
PdfRedact
A robust PDF redaction library that securely removes sensitive information while preserving document integrity.
Features at a Glance
🔒 Comprehensive Redaction
- Handles both visual content and searchable text layers
- True removal of sensitive information
⚙️ Flexible Configuration
- Adjustable image quality (DPI)
- Configurable text handling (invisible or removed)
- Cross-platform support (Windows & Unix)
🛠️ Professional-Grade Tools
- High-resolution image processing
- Precise coordinate-based redaction
- Document searchability preservation
Installation
pip install pdfredact
Dependencies
- Python 3.10+
- Poppler (PDF to image conversion)
- Pillow (Image processing)
- ReportLab (PDF generation)
Windows Setup
Windows users must install Poppler and either:
- Set the
POPPLER_PATHenvironment variable - Provide the path explicitly in function calls
How It Works
The redaction process follows a secure two-step approach:
-
Image Layer Redaction 📸
- Converts PDF pages to high-resolution images
- Applies black rectangles to specified areas
- Maintains quality through configurable DPI
-
Text Layer Processing 📝
- Generates a new searchable text layer
- Removes or obscures text in redacted regions
- Preserves document searchability
Usage Example
from pdfredact import redact_pdf_to_images, build_text_redacted_pdf
# Define redaction areas using page coordinates
redactions = [{
"bounds": {
"left": 100,
"right": 200,
"top": 50,
"bottom": 75
},
"tokensJsons": [...], # Token information
"rawText": "sensitive text"
}]
# Step 1: Generate redacted images
redacted_images = redact_pdf_to_images(
pdf_bytes=pdf_content,
pawls_pages=page_data,
page_annotations=[redactions],
dpi=300
)
# Step 2: Create final PDF with redacted text layer
build_text_redacted_pdf(
output_pdf="redacted_document.pdf",
redacted_images=redacted_images,
pawls_pages=page_data,
page_redactions=[redactions],
dpi=300,
hide_text=True # Text remains copyable but invisible
)
Best Practices
📋 Quality Assurance
- Use 300+ DPI for optimal output quality
- Verify redactions through:
- Visual inspection
- Copy/paste testing
- Text extraction tool verification
⚠️ Known Limitations
- Processing time scales with DPI and document size
- Large documents require substantial memory
- Image-based PDFs need special handling
Security Advisory
⚠️ Important: Always verify redacted documents thoroughly. The PDF format is complex and support can vary. Multiple verification methods are recommended for sensitive materials.
Disclaimer: We are not liable for consequences arising from improperly redacted PDFs.
License
pdfredact is distributed under the terms of the MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfredact-0.0.4.tar.gz.
File metadata
- Download URL: pdfredact-0.0.4.tar.gz
- Upload date:
- Size: 401.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c9aa01759fb0381b5d94b610a597a10f3182258ef3bb5feafb1d2a039aeaa30
|
|
| MD5 |
16bbdc61fba3a51b1b1b02b85ef863c2
|
|
| BLAKE2b-256 |
635cd26fdadf5429dead44ed3764166a3fef235ffb49c3e59e7c09ec3455d471
|
File details
Details for the file pdfredact-0.0.4-py3-none-any.whl.
File metadata
- Download URL: pdfredact-0.0.4-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06f4b0734480784bd8d6a81171fb5dfc7909bceaeb9956b85bbd56d0edd90c68
|
|
| MD5 |
7cd2ff31e0bdb46aaa696cb11d341f89
|
|
| BLAKE2b-256 |
ead07c245f211aecdab412cd0127b7a30986eced163b3ce0ef1729bee541b080
|