Skip to main content

No project description provided

Project description

Extend Python Library

PyPI version Python versions

The Extend Python library provides convenient, typed access to the Extend API — enabling you to parse, extract, classify, split, and edit documents with a few lines of code.

Installation

pip install extend-ai

Requires Python 3.8+

Quick start

Parse any document in three lines:

from extend_ai import Extend

client = Extend(token="YOUR_API_KEY")

result = client.parse(file={"url": "https://example.com/invoice.pdf"})

for chunk in result.output.chunks:
    print(chunk.content)

client.parse is synchronous — it sends the file, waits for processing, and returns a fully populated ParseRun with parsed chunks ready to use. The same pattern works for every capability:

# Extract structured data
extract_run = client.extract(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_abc123"},
)

# Classify a document
classify_run = client.classify(
    file={"url": "https://example.com/document.pdf"},
    classifier={"id": "cls_abc123"},
)

# Split a multi-document file
split_run = client.split(
    file={"url": "https://example.com/packet.pdf"},
    splitter={"id": "spl_abc123"},
)

# Edit a PDF with instructions
edit_run = client.edit(
    file={"url": "https://example.com/form.pdf"},
    config={"instructions": "Fill out the applicant name as Jane Doe"},
)

Note: The synchronous methods above have a 5-minute timeout and are best suited for onboarding and testing. For production workloads, use polling helpers or webhooks instead.

Polling helpers

Every run resource exposes a create_and_poll() method that creates the run and automatically polls until it reaches a terminal state (PROCESSED, FAILED, or CANCELLED):

from extend_ai import Extend

client = Extend(token="YOUR_API_KEY")

result = client.extract_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_abc123"},
)

if result.status == "PROCESSED":
    print(result.output)
else:
    print(f"Failed: {result.failure_message}")

This works across all run types:

parse_run     = client.parse_runs.create_and_poll(file={"url": "..."})
extract_run   = client.extract_runs.create_and_poll(file={"url": "..."}, extractor={"id": "..."})
classify_run  = client.classify_runs.create_and_poll(file={"url": "..."}, classifier={"id": "..."})
split_run     = client.split_runs.create_and_poll(file={"url": "..."}, splitter={"id": "..."})
workflow_run  = client.workflow_runs.create_and_poll(file={"url": "..."}, workflow={"id": "..."})
edit_run      = client.edit_runs.create_and_poll(file={"url": "..."})

Custom polling options

from extend_ai import Extend, PollingOptions

result = client.extract_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_abc123"},
    polling_options=PollingOptions(
        max_wait_ms=300_000,       # 5 minute timeout (default: no timeout)
        initial_delay_ms=1_000,    # start with 1s delay (default)
        max_delay_ms=60_000,       # cap at 60s delay (default: 30s)
    ),
)

Running workflows

Workflows chain multiple processing steps (extraction, classification, splitting, etc.) into a single pipeline. Run a workflow by passing a workflow ID and a file:

result = client.workflow_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    workflow={"id": "workflow_abc123"},
)

print(result.status)  # "PROCESSED"

for step_run in result.step_runs or []:
    print(step_run.step.type)   # "EXTRACT", "CLASSIFY", etc.
    print(step_run.result)

Webhook verification

Verify and parse incoming webhook events using the built-in utilities. Known event types are returned as typed Pydantic models; unknown or future event types fall back to a plain dict so your handler keeps working without SDK updates.

from extend_ai import Extend

client = Extend(token="YOUR_API_KEY")

def handle_webhook(request):
    event = client.webhooks.verify_and_parse(
        body=request.body.decode(),
        headers=dict(request.headers),
        signing_secret="wss_your_signing_secret",
    )

    # Works for both typed model and dict fallback
    event_type = getattr(event, "event_type", None) or event.get("eventType")
    payload = getattr(event, "payload", None) or event.get("payload")

    match event_type:
        case "extract_run.processed":
            run_id = getattr(payload, "id", None) or payload.get("id")
            print(f"Extraction complete: {run_id}")
        case "workflow_run.completed":
            run_id = getattr(payload, "id", None) or payload.get("id")
            print(f"Workflow complete: {run_id}")
        case _:
            print(f"Received event: {event_type}")

Manual verification & parsing

# Verify signature without parsing
is_valid = client.webhooks.verify(body, headers, signing_secret)

# Parse without verification (not recommended for production)
event = client.webhooks.parse(body)

Signed URL payloads

For large payloads, Extend may send a signed URL instead of the full payload. Use allow_signed_url=True, then check and fetch when needed:

event = client.webhooks.verify_and_parse(
    body=body,
    headers=headers,
    signing_secret=signing_secret,
    allow_signed_url=True,
)

if client.webhooks.is_signed_url_event(event):
    full_event = client.webhooks.fetch_signed_payload_sync(event)
    # full_event is typed or dict; use getattr(..., None) or .get() as in the example above
else:
    # Normal inline payload — handle event directly
    ...

Async support

Every method has an async counterpart via AsyncExtend:

import asyncio
from extend_ai import AsyncExtend

client = AsyncExtend(token="YOUR_API_KEY")

async def main():
    result = await client.parse(file={"url": "https://example.com/invoice.pdf"})

    for chunk in result.output.chunks:
        print(chunk.content)

asyncio.run(main())

Async polling works the same way:

result = await client.extract_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_abc123"},
)

Exception handling

The SDK raises typed exceptions for API errors:

from extend_ai.core.api_error import ApiError

try:
    result = client.parse(file={"url": "https://example.com/invoice.pdf"})
except ApiError as e:
    print(e.status_code)  # 400, 401, 404, 429, etc.
    print(e.body)

Specific error classes are available for fine-grained handling:

from extend_ai.errors import (
    BadRequestError,         # 400
    UnauthorizedError,       # 401
    PaymentRequiredError,    # 402
    ForbiddenError,          # 403
    NotFoundError,           # 404
    UnprocessableEntityError,# 422
    TooManyRequestsError,    # 429
    InternalServerError,     # 500
)

Polling timeout

When create_and_poll() exceeds its timeout, a PollingTimeoutError is raised:

from extend_ai import PollingTimeoutError

try:
    result = client.extract_runs.create_and_poll(
        file={"url": "..."},
        extractor={"id": "..."},
        polling_options=PollingOptions(max_wait_ms=60_000),
    )
except PollingTimeoutError as e:
    print(f"Timed out after {e.elapsed_ms}ms (limit: {e.max_wait_ms}ms)")

Pagination

List endpoints return paginated results using next_page_token:

# First page
response = client.extract_runs.list(max_page_size=10)

for run in response.data:
    print(f"{run.id}: {run.status}")

# Next page
if response.next_page_token:
    next_page = client.extract_runs.list(
        max_page_size=10,
        next_page_token=response.next_page_token,
    )

Environments

The SDK defaults to the US production environment. Other regions are available:

from extend_ai import Extend, ExtendEnvironment

# US (default)
client = Extend(token="YOUR_API_KEY")

# US2 (HIPAA)
client = Extend(token="YOUR_API_KEY", environment=ExtendEnvironment.PRODUCTION_US2)

# EU
client = Extend(token="YOUR_API_KEY", environment=ExtendEnvironment.PRODUCTION_EU1)

# Custom base URL
client = Extend(token="YOUR_API_KEY", base_url="https://custom-api.example.com")

Advanced

Retries

The SDK automatically retries failed requests with exponential backoff. Retries are triggered for:

  • 408 Timeout
  • 429 Too Many Requests
  • 5xx Server Errors
# Override retries for a single request
client.extract_runs.create(..., request_options={"max_retries": 0})

Timeouts

The default timeout is 300 seconds. Override globally or per-request:

# Global timeout
client = Extend(token="YOUR_API_KEY", timeout=30.0)

# Per-request timeout
client.extract_runs.create(..., request_options={"timeout_in_seconds": 60})

Custom headers

client = Extend(
    token="YOUR_API_KEY",
    headers={"X-Custom-Header": "value"},
)

Custom HTTP client

Pass a pre-configured httpx.Client for full control over transport:

import httpx
from extend_ai import Extend

client = Extend(
    token="YOUR_API_KEY",
    httpx_client=httpx.Client(
        proxy="http://my.test.proxy.example.com",
        transport=httpx.HTTPTransport(local_address="0.0.0.0"),
    ),
)

Raw responses

Access the underlying HTTP response for any request:

raw_response = client.with_raw_response.parse(file={"url": "https://example.com/invoice.pdf"})

print(raw_response.status_code)
print(raw_response.headers)
print(raw_response.data)  # ParseRun

Documentation

Full API reference documentation is available at docs.extend.ai.

A complete SDK reference is available in reference.md.

Contributing

While we value open-source contributions to this SDK, this library is generated programmatically. Additions made directly to this library would have to be moved over to our generation code, otherwise they would be overwritten upon the next generated release. Feel free to open a PR as a proof of concept, but know that we will not be able to merge it as-is. We suggest opening an issue first to discuss with us!

On the other hand, contributions to the README are always very welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extend_ai-1.2.0.tar.gz (307.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extend_ai-1.2.0-py3-none-any.whl (739.1 kB view details)

Uploaded Python 3

File details

Details for the file extend_ai-1.2.0.tar.gz.

File metadata

  • Download URL: extend_ai-1.2.0.tar.gz
  • Upload date:
  • Size: 307.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.25 Linux/6.14.0-1017-azure

File hashes

Hashes for extend_ai-1.2.0.tar.gz
Algorithm Hash digest
SHA256 588e092bb47a65041060849c161d6924adfdb2f745252d72c1d2f38e53e92a66
MD5 ccb19a2910bc02e5f8496eae7a314b9b
BLAKE2b-256 7cdb7b400d94d64f6a78e9cb1679fa297c0cc1a6b06d9cee0743d7e417d1a5e3

See more details on using hashes here.

File details

Details for the file extend_ai-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: extend_ai-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 739.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.25 Linux/6.14.0-1017-azure

File hashes

Hashes for extend_ai-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7aeadea2c7adb0d8dac5f883d70ed28ed12cbcf07de24d435a6c5be974ff165a
MD5 65cbff1a41faab77b20f9516f886e5c1
BLAKE2b-256 29db69b119cfb57962d1bcb5a9175ee30714c366b81d7f7399f9b0aa01b13e69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page