A library for processing PDFs with OCR and masking sensitive information
Project description
PDF Masking Library
pdf-masking-library is a Python library designed to process PDF files by masking sensitive information using Optical Character Recognition (OCR). It supports masking predefined patterns such as Aadhaar numbers, PAN numbers, and custom patterns provided by the user.
A Simple Example
import base64
from pdf_masking_library import process_pdf
base64_pdf_input = "Your base64 here"
custom_pattern = [r"\b\d{2}\b"]
base64_pdf_output = process_pdf(base64_pdf_input, custom_pattern=custom_pattern)
# Save the masked PDF to a file
with open("masked_output.pdf", "wb") as output_file:
output_file.write(base64.b64decode(base64_pdf_output))
Masking Information
The library automatically detects and masks the following sensitive information:
- Aadhaar Numbers: 12-digit Indian identification numbers.
- PAN Numbers: 10-character alphanumeric Permanent Account Numbers.
- Custom Patterns: User-defined patterns using regular expressions.
When providing a custom pattern, use the custom_pattern parameter, as shown above.
Command-Line Interface (CLI)
The library includes a CLI tool for easy integration into scripts and workflows.
python -m pdf_masking_library input.pdf output.pdf --custom-pattern "\b\d{2}\b"
- Mask Aadhaar and PAN Numbers (Default Behavior):
python -m pdf_masking_library input.pdf output.pdf
If you do not provide a custom pattern, the library will automatically mask Aadhaar numbers and PAN numbers in the PDF.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_masking_library-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pdf_masking_library-0.1.3-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
451beeec9861a9940c1fae49f8b1e5e6322c718d3370f4034188d322025448b0
|
|
| MD5 |
c35c2f0664abbbe3f2e9c10e825b9dfe
|
|
| BLAKE2b-256 |
7623896b2ecbe9ccd858b20311f71291b06569b57f584de9edb6ca00ed6f3b97
|