Official Python SDK for the PDFCanon API
Project description
PDFCanon Python SDK
Official Python SDK for the PDFCanon API — normalize, sanitize, and validate PDF files at scale. No third-party dependencies; requires Python 3.9+.
Requirements
- Python 3.9 or later
- No external dependencies (uses only the standard library)
Installation
pip install pdfcanon
Authentication
Obtain an API key from the PDFCanon portal. Pass it directly or set the PDFCANON_API_KEY environment variable:
export PDFCANON_API_KEY="pdfn_your_key_here"
The SDK sends the key as X-Api-Key: pdfn_… in every request. If you call the REST API directly, use the same header:
curl -H "X-Api-Key: pdfn_your_key_here" https://api.pdfcanon.com/api/submissions
Quickstart (Synchronous)
import pdfcanon
# API key is read from PDFCANON_API_KEY if not passed
client = pdfcanon.Client(api_key="pdfn_your_key_here")
with open("/path/to/document.pdf", "rb") as f:
response = client.normalize(f, file_name="document.pdf")
print(f"Status: {response.status}")
print(f"Submission ID: {response.submission_id}")
Quickstart (Async)
import asyncio
import pdfcanon
async def main():
client = pdfcanon.AsyncClient(api_key="pdfn_your_key_here")
async with open("/path/to/document.pdf", "rb") as f:
response = await client.normalize(f, file_name="document.pdf")
print(f"Status: {response.status}")
print(f"Submission ID: {response.submission_id}")
asyncio.run(main())
Async / Poll Flow
Large PDFs are processed asynchronously. Use wait_for_completion to poll until done:
import pdfcanon
client = pdfcanon.Client() # reads PDFCANON_API_KEY from environment
with open("/path/to/document.pdf", "rb") as f:
initial = client.normalize(f, file_name="document.pdf")
# Poll until processing completes (up to 120 seconds)
result = client.wait_for_completion(initial.submission_id, timeout=120.0)
if result.status == "SUCCESS":
# Download the normalized PDF
pdf_bytes = client.download_artifact(result.normalized.sha256)
with open("/path/to/normalized.pdf", "wb") as out:
out.write(pdf_bytes)
print(f"Original size: {result.original.size_bytes:,} bytes")
print(f"Normalized size: {result.normalized.size_bytes:,} bytes")
print(f"JavaScript removed: {result.security.javascript_removed}")
else:
print(f"Failed: [{result.failure.code}] {result.failure.message}")
Webhook Flow
For production use, register a webhook endpoint instead of polling:
client = pdfcanon.Client()
with open("/path/to/document.pdf", "rb") as f:
response = client.normalize(
f,
file_name="document.pdf",
webhook_url="https://your-app.example.com/webhooks/pdfcanon",
remove_annotations=True,
idempotency_key="unique-key-per-document",
)
# Returns a response with status PENDING or IN_PROGRESS;
# webhook fires when processing completes.
print(f"Queued with submission ID: {response.submission_id}")
Webhook Signature Verification
Verify incoming webhook signatures in your web framework:
# Flask example
from flask import Flask, request, abort
from pdfcanon.webhooks import verify_signature, InvalidSignatureError
import os
app = Flask(__name__)
WEBHOOK_SECRET = os.environ["PDFCANON_WEBHOOK_SECRET"]
@app.post("/webhooks/pdfcanon")
def handle_pdfcanon_webhook():
raw_body = request.get_data(as_text=True)
signature = request.headers.get("X-PDFCanon-Signature", "")
try:
verify_signature(raw_body, signature, WEBHOOK_SECRET)
except InvalidSignatureError:
abort(401, "Invalid webhook signature")
event = request.get_json(force=True)
event_type = event.get("event_type")
if event_type == "pdf.normalized":
sha256 = event["normalized_sha256"]
print(f"PDF ready: {sha256}")
# Download and store the normalized PDF...
elif event_type == "pdf.failed":
print(f"PDF failed: {event['failure']['message']}")
return {"ok": True}
Configuration
import pdfcanon
client = pdfcanon.Client(
api_key="pdfn_your_key_here",
base_url="https://api.pdfcanon.com/api", # Default
connect_timeout=5.0, # Seconds; default: 5
read_timeout=120.0, # Seconds; default: 120
max_retries=3, # Default: 3
)
Error Handling
import pdfcanon
from pdfcanon import (
AuthenticationError,
PolicyRejectionError,
RateLimitError,
ToolchainError,
NetworkError,
PDFCanonError,
)
try:
with open("/path/to/document.pdf", "rb") as f:
result = client.normalize(f)
except AuthenticationError:
print("Invalid API key or expired token")
except PolicyRejectionError as e:
# 422: the PDF violates intake policy (encrypted, too large, etc.)
print(f"PDF rejected: {e}")
except RateLimitError as e:
# 429: monthly quota or rate limit exceeded
print(f"Rate limited. Retry after {e.retry_after} seconds")
except ToolchainError as e:
# 5xx: server-side processing failure
print(f"Server error: {e}")
except NetworkError as e:
# Timeout, DNS failure, etc.
print(f"Network error: {e}")
except PDFCanonError as e:
# Base class — catch all SDK errors
print(f"Unexpected SDK error: {e}")
Error Reference
| Exception | HTTP Status | When |
|---|---|---|
AuthenticationError |
401 | Invalid or missing API key |
PolicyRejectionError |
422 | PDF rejected by intake policy |
RateLimitError |
429 | Monthly quota or rate limit exceeded |
ToolchainError |
5xx | Server-side processing failure |
NetworkError |
— | Network / timeout error |
PDFCanonError |
— | Base class for all SDK errors |
Models Reference
| Model | Key Fields |
|---|---|
NormalizeResponse |
status, submission_id, original, normalized, security, validation, warnings, failure |
OriginalInfo |
sha256, size_bytes |
NormalizedInfo |
sha256, size_bytes, pdf_version, linearized, download_url |
SecurityInfo |
javascript_removed, open_actions_removed, embedded_files_removed, ... |
ValidationInfo |
xref_rebuilt, object_streams_regenerated, pdfa_compliant, ... |
WarningInfo |
code, message |
FailureInfo |
code, message, stage |
Further Reading
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfcanon-1.0.1.tar.gz.
File metadata
- Download URL: pdfcanon-1.0.1.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14252b2e8c7f560690fbfc18389429d8f8de30dc74ac5b401d82df5182629344
|
|
| MD5 |
6695f1725b4be609177641a890108fd8
|
|
| BLAKE2b-256 |
488316a1c4a2afd82ab859a95790e89692b98030d4a70b5d6262468316ea00cf
|
File details
Details for the file pdfcanon-1.0.1-py3-none-any.whl.
File metadata
- Download URL: pdfcanon-1.0.1-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c7f622b434475b9bd05387e8b17502643f09b1ba8fbaa3554394ff1e2bb6f9b
|
|
| MD5 |
3358e6ded7cbb068cd2800cee34b08a9
|
|
| BLAKE2b-256 |
acab411c98dab329f4943c0454f0af0e6697cd088b568e20585ca542569b2c61
|