Skip to main content

Official Python SDK for the PDFCanon API

Project description

PDFCanon Python SDK

Official Python SDK for the PDFCanon API — normalize, sanitize, and validate PDF files at scale. No third-party dependencies; requires Python 3.9+.

Requirements

  • Python 3.9 or later
  • No external dependencies (uses only the standard library)

Installation

pip install pdfcanon

Authentication

Obtain an API key from the PDFCanon portal. Pass it directly or set the PDFCANON_API_KEY environment variable:

export PDFCANON_API_KEY="pdfn_your_key_here"

The SDK sends the key as X-Api-Key: pdfn_… in every request. If you call the REST API directly, use the same header:

curl -H "X-Api-Key: pdfn_your_key_here" https://api.pdfcanon.com/api/submissions

Quickstart (Synchronous)

import pdfcanon

# API key is read from PDFCANON_API_KEY if not passed
client = pdfcanon.Client(api_key="pdfn_your_key_here")

with open("/path/to/document.pdf", "rb") as f:
    response = client.normalize(f, file_name="document.pdf")

print(f"Status: {response.status}")
print(f"Submission ID: {response.submission_id}")

Quickstart (Async)

import asyncio
import pdfcanon

async def main():
    client = pdfcanon.AsyncClient(api_key="pdfn_your_key_here")

    async with open("/path/to/document.pdf", "rb") as f:
        response = await client.normalize(f, file_name="document.pdf")

    print(f"Status: {response.status}")
    print(f"Submission ID: {response.submission_id}")

asyncio.run(main())

Async / Poll Flow

Large PDFs are processed asynchronously. Use wait_for_completion to poll until done:

import pdfcanon

client = pdfcanon.Client()  # reads PDFCANON_API_KEY from environment

with open("/path/to/document.pdf", "rb") as f:
    initial = client.normalize(f, file_name="document.pdf")

# Poll until processing completes (up to 120 seconds)
result = client.wait_for_completion(initial.submission_id, timeout=120.0)

if result.status == "SUCCESS":
    # Download the normalized PDF
    pdf_bytes = client.download_artifact(result.normalized.sha256)
    with open("/path/to/normalized.pdf", "wb") as out:
        out.write(pdf_bytes)

    print(f"Original size:   {result.original.size_bytes:,} bytes")
    print(f"Normalized size: {result.normalized.size_bytes:,} bytes")
    print(f"JavaScript removed: {result.security.javascript_removed}")
else:
    print(f"Failed: [{result.failure.code}] {result.failure.message}")

Webhook Flow

For production use, register a webhook endpoint instead of polling:

client = pdfcanon.Client()

with open("/path/to/document.pdf", "rb") as f:
    response = client.normalize(
        f,
        file_name="document.pdf",
        webhook_url="https://your-app.example.com/webhooks/pdfcanon",
        remove_annotations=True,
        idempotency_key="unique-key-per-document",
    )
# Returns a response with status PENDING or IN_PROGRESS;
# webhook fires when processing completes.
print(f"Queued with submission ID: {response.submission_id}")

Webhook Signature Verification

Verify incoming webhook signatures in your web framework:

# Flask example
from flask import Flask, request, abort
from pdfcanon.webhooks import verify_signature, InvalidSignatureError
import os

app = Flask(__name__)
WEBHOOK_SECRET = os.environ["PDFCANON_WEBHOOK_SECRET"]

@app.post("/webhooks/pdfcanon")
def handle_pdfcanon_webhook():
    raw_body = request.get_data(as_text=True)
    signature = request.headers.get("X-PDFCanon-Signature", "")

    try:
        verify_signature(raw_body, signature, WEBHOOK_SECRET)
    except InvalidSignatureError:
        abort(401, "Invalid webhook signature")

    event = request.get_json(force=True)
    event_type = event.get("event_type")

    if event_type == "pdf.normalized":
        sha256 = event["normalized_sha256"]
        print(f"PDF ready: {sha256}")
        # Download and store the normalized PDF...
    elif event_type == "pdf.failed":
        print(f"PDF failed: {event['failure']['message']}")

    return {"ok": True}

Configuration

import pdfcanon

client = pdfcanon.Client(
    api_key="pdfn_your_key_here",
    base_url="https://api.pdfcanon.com/api",  # Default
    connect_timeout=5.0,                      # Seconds; default: 5
    read_timeout=120.0,                       # Seconds; default: 120
    max_retries=3,                            # Default: 3
)

Error Handling

import pdfcanon
from pdfcanon import (
    AuthenticationError,
    PolicyRejectionError,
    RateLimitError,
    ToolchainError,
    NetworkError,
    PDFCanonError,
)

try:
    with open("/path/to/document.pdf", "rb") as f:
        result = client.normalize(f)
except AuthenticationError:
    print("Invalid API key or expired token")
except PolicyRejectionError as e:
    # 422: the PDF violates intake policy (encrypted, too large, etc.)
    print(f"PDF rejected: {e}")
except RateLimitError as e:
    # 429: monthly quota or rate limit exceeded
    print(f"Rate limited. Retry after {e.retry_after} seconds")
except ToolchainError as e:
    # 5xx: server-side processing failure
    print(f"Server error: {e}")
except NetworkError as e:
    # Timeout, DNS failure, etc.
    print(f"Network error: {e}")
except PDFCanonError as e:
    # Base class — catch all SDK errors
    print(f"Unexpected SDK error: {e}")

Error Reference

Exception HTTP Status When
AuthenticationError 401 Invalid or missing API key
PolicyRejectionError 422 PDF rejected by intake policy
RateLimitError 429 Monthly quota or rate limit exceeded
ToolchainError 5xx Server-side processing failure
NetworkError Network / timeout error
PDFCanonError Base class for all SDK errors

Models Reference

Model Key Fields
NormalizeResponse status, submission_id, original, normalized, security, validation, warnings, failure
OriginalInfo sha256, size_bytes
NormalizedInfo sha256, size_bytes, pdf_version, linearized, download_url
SecurityInfo javascript_removed, open_actions_removed, embedded_files_removed, ...
ValidationInfo xref_rebuilt, object_streams_regenerated, pdfa_compliant, ...
WarningInfo code, message
FailureInfo code, message, stage

Further Reading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfcanon-1.0.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfcanon-1.0.1-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file pdfcanon-1.0.1.tar.gz.

File metadata

  • Download URL: pdfcanon-1.0.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pdfcanon-1.0.1.tar.gz
Algorithm Hash digest
SHA256 14252b2e8c7f560690fbfc18389429d8f8de30dc74ac5b401d82df5182629344
MD5 6695f1725b4be609177641a890108fd8
BLAKE2b-256 488316a1c4a2afd82ab859a95790e89692b98030d4a70b5d6262468316ea00cf

See more details on using hashes here.

File details

Details for the file pdfcanon-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pdfcanon-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pdfcanon-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3c7f622b434475b9bd05387e8b17502643f09b1ba8fbaa3554394ff1e2bb6f9b
MD5 3358e6ded7cbb068cd2800cee34b08a9
BLAKE2b-256 acab411c98dab329f4943c0454f0af0e6697cd088b568e20585ca542569b2c61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page