PDF Sentinel is a lightweight safety inspection library for PDF documents. It detects oversized, vector-heavy, or otherwise resource-intensive pages (like blueprints) that could slow down or crash OCR and pipelines.
Project description
PDF Sentinel
PDF Sentinel is a lightweight safety inspection library for PDF documents. It detects oversized, vector-heavy, or otherwise resource-intensive pages (like blueprints) that could slow down or crash OCR and pipelines.
Features
-
Detects dangerous or heavy PDF pages:
- Large page dimensions (A0, engineering blueprints, etc.)
- Massive embedded images
- Vector-heavy drawings (architectural plans)
- Pages that exceed safe rasterization thresholds
-
Returns simplified or detailed analysis for pages and files
-
Configurable limits (page size, pixel thresholds, etc.)
-
Optional JSON response for easy API integration
Installation
pip install PDFSentinel
This will install the library from PyPI and make it available to import in your project.
Usage
Simplified file safety check
from pdfsentinel import PDFSentinel
sentinel = PDFSentinel()
result = sentinel.is_file_safe("samples/test.pdf")
print(result)
Output (Python dict):
{
"file_name": "test.pdf",
"pages": 23,
"is_file_safety": true,
"unsafety_pages": []
}
You can also get JSON-formatted output directly:
print(sentinel.is_file_safe("samples/test.pdf", json_response=True))
Simplified page safety check
print(sentinel.is_page_safe("samples/test.pdf", 2, json_response=True))
Example output:
{
"file_name": "test1.pdf",
"page": 24,
"is_page_safety": false,
"errors": [
"page_too_large:2592.0x1728.0",
"too_many_vector_ops:33035",
"raster_estimate_too_big:77760000"
]
}
Full file analysis
print(sentinel.file_analysis("samples/test.pdf", json_response=True))
Output:
{
"file_name": "test1.pdf",
"pages": 2,
"is_file_safety": true,
"results": [
{
"page": 1,
"is_page_safety": true,
"errors": [],
"page_width": 612.0,
"page_height": 792.0,
"max_image_pixels": 0,
"max_vectors_operations": 58,
"max_raster_pixels": 8415000
},
{
"page": 2,
"is_page_safety": false,
"errors": [
"page_too_large:2592.0x1728.0",
"too_many_vector_ops:33035",
"raster_estimate_too_big:77760000"
],
"page_width": 2592.0,
"page_height": 1728.0,
"max_image_pixels": 354652,
"max_vectors_operations": 33035,
"max_raster_pixels": 77760000
}
]
}
Single page detailed analysis
print(sentinel.page_analysis("samples/test.pdf", 3, json_response=True))
Configuration
You can override safety thresholds per call:
sentinel.is_file_safe(
"samples/test.pdf",
config={
"max_page_size": 1800,
"max_image_pixels": 10_000_000,
"max_vectors_operations": 1000,
"max_raster_pixels": 20_000_000
}
)
| Parameter | Default | Description |
|---|---|---|
max_page_size |
2000 | Max page dimension in points |
max_image_pixels |
20,000,000 | Max embedded image total pixels size (w × h) |
max_vectors_operations |
1500 | Max allowed vector drawing operations |
max_raster_pixels |
30,000,000 | Estimated max rasterization size (at 300 dpi) |
License
MIT License © 2025 — Not Empty Foundation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfsentinel-1.1.0.tar.gz.
File metadata
- Download URL: pdfsentinel-1.1.0.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dd2e046bf3022df4145c35206678f8b0b135279291b82c69da9ab851e743bcd
|
|
| MD5 |
08df2fc84ca8215a051d39da0eec2911
|
|
| BLAKE2b-256 |
d8401bc9d4dd515d911c835e9d86f8eca59e1bda943724b7b2750dacc191034a
|
File details
Details for the file pdfsentinel-1.1.0-py3-none-any.whl.
File metadata
- Download URL: pdfsentinel-1.1.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6612e1ffbaec36cff7a4b5bca6d7d9a256441268519107f27fb6b0a8eea9796d
|
|
| MD5 |
64430e59bbe71d54a78575f0007ad6d8
|
|
| BLAKE2b-256 |
c8131c85ac4e6e3e55198e5506e329b9fb5c05c716c9f3e0818c534612c1c7a3
|