Skip to main content

No project description provided

Project description

Extend Python Library

PyPI version Python versions

The Extend Python library provides convenient, typed access to the Extend API — enabling you to parse, extract, classify, split, and edit documents with a few lines of code.

Installation

pip install extend-ai

Requires Python 3.8+

Quick start

Parse any document in three lines:

from extend_ai import Extend

client = Extend(token="YOUR_API_KEY")

result = client.parse(file={"url": "https://example.com/invoice.pdf"})

for chunk in result.output.chunks:
    print(chunk.content)

client.parse is synchronous — it sends the file, waits for processing, and returns a fully populated ParseRun with parsed chunks ready to use. The same pattern works for every capability:

# Extract structured data
extract_run = client.extract(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_abc123"},
)

# Classify a document
classify_run = client.classify(
    file={"url": "https://example.com/document.pdf"},
    classifier={"id": "cls_abc123"},
)

# Split a multi-document file
split_run = client.split(
    file={"url": "https://example.com/packet.pdf"},
    splitter={"id": "spl_abc123"},
)

# Edit a PDF with instructions
edit_run = client.edit(
    file={"url": "https://example.com/form.pdf"},
    config={"instructions": "Fill out the applicant name as Jane Doe"},
)

Note: The synchronous methods above have a 5-minute timeout and are best suited for onboarding and testing. For production workloads, use polling helpers or webhooks instead.

Polling helpers

Every run resource exposes a create_and_poll() method that creates the run and automatically polls until it reaches a terminal state (PROCESSED, FAILED, or CANCELLED):

from extend_ai import Extend

client = Extend(token="YOUR_API_KEY")

result = client.extract_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_abc123"},
)

if result.status == "PROCESSED":
    print(result.output)
else:
    print(f"Failed: {result.failure_message}")

This works across all run types:

parse_run     = client.parse_runs.create_and_poll(file={"url": "..."})
extract_run   = client.extract_runs.create_and_poll(file={"url": "..."}, extractor={"id": "..."})
classify_run  = client.classify_runs.create_and_poll(file={"url": "..."}, classifier={"id": "..."})
split_run     = client.split_runs.create_and_poll(file={"url": "..."}, splitter={"id": "..."})
workflow_run  = client.workflow_runs.create_and_poll(file={"url": "..."}, workflow={"id": "..."})
edit_run      = client.edit_runs.create_and_poll(file={"url": "..."})

Custom polling options

from extend_ai import Extend, PollingOptions

result = client.extract_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_abc123"},
    polling_options=PollingOptions(
        max_wait_ms=300_000,       # 5 minute timeout (default: no timeout)
        initial_delay_ms=1_000,    # start with 1s delay (default)
        max_delay_ms=60_000,       # cap at 60s delay (default: 30s)
    ),
)

Running workflows

Workflows chain multiple processing steps (extraction, classification, splitting, etc.) into a single pipeline. Run a workflow by passing a workflow ID and a file:

result = client.workflow_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    workflow={"id": "workflow_abc123"},
)

print(result.status)  # "PROCESSED"

for step_run in result.step_runs or []:
    print(step_run.step.type)   # "EXTRACT", "CLASSIFY", etc.
    print(step_run.result)

Webhook verification

Verify and parse incoming webhook events using the built-in utilities. Known event types are returned as typed Pydantic models; unknown or future event types fall back to a plain dict so your handler keeps working without SDK updates.

from extend_ai import Extend

client = Extend(token="YOUR_API_KEY")

def handle_webhook(request):
    event = client.webhooks.verify_and_parse(
        body=request.body.decode(),
        headers=dict(request.headers),
        signing_secret="wss_your_signing_secret",
    )

    # Works for both typed model and dict fallback
    event_type = getattr(event, "event_type", None) or event.get("eventType")
    payload = getattr(event, "payload", None) or event.get("payload")

    match event_type:
        case "extract_run.processed":
            run_id = getattr(payload, "id", None) or payload.get("id")
            print(f"Extraction complete: {run_id}")
        case "workflow_run.completed":
            run_id = getattr(payload, "id", None) or payload.get("id")
            print(f"Workflow complete: {run_id}")
        case _:
            print(f"Received event: {event_type}")

Manual verification & parsing

# Verify signature without parsing
is_valid = client.webhooks.verify(body, headers, signing_secret)

# Parse without verification (not recommended for production)
event = client.webhooks.parse(body)

Signed URL payloads

For large payloads, Extend may send a signed URL instead of the full payload. Use allow_signed_url=True, then check and fetch when needed:

event = client.webhooks.verify_and_parse(
    body=body,
    headers=headers,
    signing_secret=signing_secret,
    allow_signed_url=True,
)

if client.webhooks.is_signed_url_event(event):
    full_event = client.webhooks.fetch_signed_payload_sync(event)
    # full_event is typed or dict; use getattr(..., None) or .get() as in the example above
else:
    # Normal inline payload — handle event directly
    ...

Async support

Every method has an async counterpart via AsyncExtend:

import asyncio
from extend_ai import AsyncExtend

client = AsyncExtend(token="YOUR_API_KEY")

async def main():
    result = await client.parse(file={"url": "https://example.com/invoice.pdf"})

    for chunk in result.output.chunks:
        print(chunk.content)

asyncio.run(main())

Async polling works the same way:

result = await client.extract_runs.create_and_poll(
    file={"url": "https://example.com/invoice.pdf"},
    extractor={"id": "ex_abc123"},
)

Exception handling

The SDK raises typed exceptions for API errors:

from extend_ai.core.api_error import ApiError

try:
    result = client.parse(file={"url": "https://example.com/invoice.pdf"})
except ApiError as e:
    print(e.status_code)  # 400, 401, 404, 429, etc.
    print(e.body)

Specific error classes are available for fine-grained handling:

from extend_ai.errors import (
    BadRequestError,         # 400
    UnauthorizedError,       # 401
    PaymentRequiredError,    # 402
    ForbiddenError,          # 403
    NotFoundError,           # 404
    UnprocessableEntityError,# 422
    TooManyRequestsError,    # 429
    InternalServerError,     # 500
)

Polling timeout

When create_and_poll() exceeds its timeout, a PollingTimeoutError is raised:

from extend_ai import PollingTimeoutError

try:
    result = client.extract_runs.create_and_poll(
        file={"url": "..."},
        extractor={"id": "..."},
        polling_options=PollingOptions(max_wait_ms=60_000),
    )
except PollingTimeoutError as e:
    print(f"Timed out after {e.elapsed_ms}ms (limit: {e.max_wait_ms}ms)")

Pagination

List endpoints return paginated results using next_page_token:

# First page
response = client.extract_runs.list(max_page_size=10)

for run in response.data:
    print(f"{run.id}: {run.status}")

# Next page
if response.next_page_token:
    next_page = client.extract_runs.list(
        max_page_size=10,
        next_page_token=response.next_page_token,
    )

Environments

The SDK defaults to the US production environment. Other regions are available:

from extend_ai import Extend, ExtendEnvironment

# US (default)
client = Extend(token="YOUR_API_KEY")

# US2 (HIPAA)
client = Extend(token="YOUR_API_KEY", environment=ExtendEnvironment.PRODUCTION_US2)

# EU
client = Extend(token="YOUR_API_KEY", environment=ExtendEnvironment.PRODUCTION_EU1)

# Custom base URL
client = Extend(token="YOUR_API_KEY", base_url="https://custom-api.example.com")

Advanced

Retries

The SDK automatically retries failed requests with exponential backoff. Retries are triggered for:

  • 408 Timeout
  • 429 Too Many Requests
  • 5xx Server Errors
# Override retries for a single request
client.extract_runs.create(..., request_options={"max_retries": 0})

Timeouts

The default timeout is 300 seconds. Override globally or per-request:

# Global timeout
client = Extend(token="YOUR_API_KEY", timeout=30.0)

# Per-request timeout
client.extract_runs.create(..., request_options={"timeout_in_seconds": 60})

Custom headers

client = Extend(
    token="YOUR_API_KEY",
    headers={"X-Custom-Header": "value"},
)

Custom HTTP client

Pass a pre-configured httpx.Client for full control over transport:

import httpx
from extend_ai import Extend

client = Extend(
    token="YOUR_API_KEY",
    httpx_client=httpx.Client(
        proxy="http://my.test.proxy.example.com",
        transport=httpx.HTTPTransport(local_address="0.0.0.0"),
    ),
)

Raw responses

Access the underlying HTTP response for any request:

raw_response = client.with_raw_response.parse(file={"url": "https://example.com/invoice.pdf"})

print(raw_response.status_code)
print(raw_response.headers)
print(raw_response.data)  # ParseRun

Documentation

Full API reference documentation is available at docs.extend.ai.

A complete SDK reference is available in reference.md.

Custom patches

This SDK includes patches to Fern-generated core files that fix bugs not yet addressed upstream. These files are listed in .fernignore so Fern does not overwrite them during generation.

File What it fixes
src/extend_ai/core/serialization.py Circular TypedDict alias resolution on Python 3.10+ (field aliases like extend_edit:bbox were sent with underscores)
src/extend_ai/core/unchecked_base_model.py ForwardRef resolution for Chunk.blocks, strict union discriminant matching for BlockDetails, enum serialization warnings

Each patch has regression tests in tests/custom/. If a Fern update accidentally overwrites a patched file, CI will fail.

Maintaining patches

  1. Make your fix on a branch, add regression tests in tests/custom/
  2. Add the patched file to .fernignore if not already listed
  3. If the fix applies to the v0 API, cherry-pick it onto the v0.x branch and update .fernignore there too
  4. Note: .fernignore means Fern won't auto-update the file — if Fern releases upstream improvements, merge them manually

Contributing

While we value open-source contributions to this SDK, this library is generated programmatically. Additions made directly to this library would have to be moved over to our generation code, otherwise they would be overwritten upon the next generated release. Feel free to open a PR as a proof of concept, but know that we will not be able to merge it as-is. We suggest opening an issue first to discuss with us!

On the other hand, contributions to the README are always very welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extend_ai-1.9.0.tar.gz (396.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extend_ai-1.9.0-py3-none-any.whl (942.0 kB view details)

Uploaded Python 3

File details

Details for the file extend_ai-1.9.0.tar.gz.

File metadata

  • Download URL: extend_ai-1.9.0.tar.gz
  • Upload date:
  • Size: 396.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.25 Linux/6.17.0-1010-azure

File hashes

Hashes for extend_ai-1.9.0.tar.gz
Algorithm Hash digest
SHA256 538bc0552d31e109d346f951a4a1d647486d8a0f445e569467bf8e33559cecc0
MD5 9531c1fd4b4c5c127f1708e7f2f9c108
BLAKE2b-256 ab2245420c8fb1d10d96e2ee53806de3426b455e74219945a45073c6d1e2416f

See more details on using hashes here.

File details

Details for the file extend_ai-1.9.0-py3-none-any.whl.

File metadata

  • Download URL: extend_ai-1.9.0-py3-none-any.whl
  • Upload date:
  • Size: 942.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.25 Linux/6.17.0-1010-azure

File hashes

Hashes for extend_ai-1.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3dc3a58641ed0613c157af3f6bda618ba6364521ccab4e19a2ae8cf3adadb832
MD5 df449a736646922e3370609b7b93cac3
BLAKE2b-256 39170779f1baf95eec43388a2a16a07018b6322cd4a8124669982a6ad37900a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page