No project description provided
Project description
Extend Python Library
The Extend Python library provides convenient, typed access to the Extend API — enabling you to parse, extract, classify, split, and edit documents with a few lines of code.
Installation
pip install extend-ai
Requires Python 3.8+
Quick start
Parse any document in three lines:
from extend_ai import Extend
client = Extend(token="YOUR_API_KEY")
result = client.parse(file={"url": "https://example.com/invoice.pdf"})
for chunk in result.output.chunks:
print(chunk.content)
client.parse is synchronous — it sends the file, waits for processing, and returns a fully populated ParseRun with parsed chunks ready to use. The same pattern works for every capability:
# Extract structured data
extract_run = client.extract(
file={"url": "https://example.com/invoice.pdf"},
extractor={"id": "ex_abc123"},
)
# Classify a document
classify_run = client.classify(
file={"url": "https://example.com/document.pdf"},
classifier={"id": "cls_abc123"},
)
# Split a multi-document file
split_run = client.split(
file={"url": "https://example.com/packet.pdf"},
splitter={"id": "spl_abc123"},
)
# Edit a PDF with instructions
edit_run = client.edit(
file={"url": "https://example.com/form.pdf"},
config={"instructions": "Fill out the applicant name as Jane Doe"},
)
Note: The synchronous methods above have a 5-minute timeout and are best suited for onboarding and testing. For production workloads, use polling helpers or webhooks instead.
Polling helpers
Every run resource exposes a create_and_poll() method that creates the run and automatically polls until it reaches a terminal state (PROCESSED, FAILED, or CANCELLED):
from extend_ai import Extend
client = Extend(token="YOUR_API_KEY")
result = client.extract_runs.create_and_poll(
file={"url": "https://example.com/invoice.pdf"},
extractor={"id": "ex_abc123"},
)
if result.status == "PROCESSED":
print(result.output)
else:
print(f"Failed: {result.failure_message}")
This works across all run types:
parse_run = client.parse_runs.create_and_poll(file={"url": "..."})
extract_run = client.extract_runs.create_and_poll(file={"url": "..."}, extractor={"id": "..."})
classify_run = client.classify_runs.create_and_poll(file={"url": "..."}, classifier={"id": "..."})
split_run = client.split_runs.create_and_poll(file={"url": "..."}, splitter={"id": "..."})
workflow_run = client.workflow_runs.create_and_poll(file={"url": "..."}, workflow={"id": "..."})
edit_run = client.edit_runs.create_and_poll(file={"url": "..."})
Custom polling options
from extend_ai import Extend, PollingOptions
result = client.extract_runs.create_and_poll(
file={"url": "https://example.com/invoice.pdf"},
extractor={"id": "ex_abc123"},
polling_options=PollingOptions(
max_wait_ms=300_000, # 5 minute timeout (default: no timeout)
initial_delay_ms=1_000, # start with 1s delay (default)
max_delay_ms=60_000, # cap at 60s delay (default: 30s)
),
)
Running workflows
Workflows chain multiple processing steps (extraction, classification, splitting, etc.) into a single pipeline. Run a workflow by passing a workflow ID and a file:
result = client.workflow_runs.create_and_poll(
file={"url": "https://example.com/invoice.pdf"},
workflow={"id": "workflow_abc123"},
)
print(result.status) # "PROCESSED"
for step_run in result.step_runs or []:
print(step_run.step.type) # "EXTRACT", "CLASSIFY", etc.
print(step_run.result)
Webhook verification
Verify and parse incoming webhook events using the built-in utilities. Known event types are returned as typed Pydantic models; unknown or future event types fall back to a plain dict so your handler keeps working without SDK updates.
from extend_ai import Extend
client = Extend(token="YOUR_API_KEY")
def handle_webhook(request):
event = client.webhooks.verify_and_parse(
body=request.body.decode(),
headers=dict(request.headers),
signing_secret="wss_your_signing_secret",
)
# Works for both typed model and dict fallback
event_type = getattr(event, "event_type", None) or event.get("eventType")
payload = getattr(event, "payload", None) or event.get("payload")
match event_type:
case "extract_run.processed":
run_id = getattr(payload, "id", None) or payload.get("id")
print(f"Extraction complete: {run_id}")
case "workflow_run.completed":
run_id = getattr(payload, "id", None) or payload.get("id")
print(f"Workflow complete: {run_id}")
case _:
print(f"Received event: {event_type}")
Manual verification & parsing
# Verify signature without parsing
is_valid = client.webhooks.verify(body, headers, signing_secret)
# Parse without verification (not recommended for production)
event = client.webhooks.parse(body)
Signed URL payloads
For large payloads, Extend may send a signed URL instead of the full payload. Use allow_signed_url=True, then check and fetch when needed:
event = client.webhooks.verify_and_parse(
body=body,
headers=headers,
signing_secret=signing_secret,
allow_signed_url=True,
)
if client.webhooks.is_signed_url_event(event):
full_event = client.webhooks.fetch_signed_payload_sync(event)
# full_event is typed or dict; use getattr(..., None) or .get() as in the example above
else:
# Normal inline payload — handle event directly
...
Async support
Every method has an async counterpart via AsyncExtend:
import asyncio
from extend_ai import AsyncExtend
client = AsyncExtend(token="YOUR_API_KEY")
async def main():
result = await client.parse(file={"url": "https://example.com/invoice.pdf"})
for chunk in result.output.chunks:
print(chunk.content)
asyncio.run(main())
Async polling works the same way:
result = await client.extract_runs.create_and_poll(
file={"url": "https://example.com/invoice.pdf"},
extractor={"id": "ex_abc123"},
)
Exception handling
The SDK raises typed exceptions for API errors:
from extend_ai.core.api_error import ApiError
try:
result = client.parse(file={"url": "https://example.com/invoice.pdf"})
except ApiError as e:
print(e.status_code) # 400, 401, 404, 429, etc.
print(e.body)
Specific error classes are available for fine-grained handling:
from extend_ai.errors import (
BadRequestError, # 400
UnauthorizedError, # 401
PaymentRequiredError, # 402
ForbiddenError, # 403
NotFoundError, # 404
UnprocessableEntityError,# 422
TooManyRequestsError, # 429
InternalServerError, # 500
)
Polling timeout
When create_and_poll() exceeds its timeout, a PollingTimeoutError is raised:
from extend_ai import PollingTimeoutError
try:
result = client.extract_runs.create_and_poll(
file={"url": "..."},
extractor={"id": "..."},
polling_options=PollingOptions(max_wait_ms=60_000),
)
except PollingTimeoutError as e:
print(f"Timed out after {e.elapsed_ms}ms (limit: {e.max_wait_ms}ms)")
Pagination
List endpoints return paginated results using next_page_token:
# First page
response = client.extract_runs.list(max_page_size=10)
for run in response.data:
print(f"{run.id}: {run.status}")
# Next page
if response.next_page_token:
next_page = client.extract_runs.list(
max_page_size=10,
next_page_token=response.next_page_token,
)
Environments
The SDK defaults to the US production environment. Other regions are available:
from extend_ai import Extend, ExtendEnvironment
# US (default)
client = Extend(token="YOUR_API_KEY")
# US2 (HIPAA)
client = Extend(token="YOUR_API_KEY", environment=ExtendEnvironment.PRODUCTION_US2)
# EU
client = Extend(token="YOUR_API_KEY", environment=ExtendEnvironment.PRODUCTION_EU1)
# Custom base URL
client = Extend(token="YOUR_API_KEY", base_url="https://custom-api.example.com")
Advanced
Retries
The SDK automatically retries failed requests with exponential backoff. Retries are triggered for:
408Timeout429Too Many Requests5xxServer Errors
# Override retries for a single request
client.extract_runs.create(..., request_options={"max_retries": 0})
Timeouts
The default timeout is 300 seconds. Override globally or per-request:
# Global timeout
client = Extend(token="YOUR_API_KEY", timeout=30.0)
# Per-request timeout
client.extract_runs.create(..., request_options={"timeout_in_seconds": 60})
Custom headers
client = Extend(
token="YOUR_API_KEY",
headers={"X-Custom-Header": "value"},
)
Custom HTTP client
Pass a pre-configured httpx.Client for full control over transport:
import httpx
from extend_ai import Extend
client = Extend(
token="YOUR_API_KEY",
httpx_client=httpx.Client(
proxy="http://my.test.proxy.example.com",
transport=httpx.HTTPTransport(local_address="0.0.0.0"),
),
)
Raw responses
Access the underlying HTTP response for any request:
raw_response = client.with_raw_response.parse(file={"url": "https://example.com/invoice.pdf"})
print(raw_response.status_code)
print(raw_response.headers)
print(raw_response.data) # ParseRun
Documentation
Full API reference documentation is available at docs.extend.ai.
A complete SDK reference is available in reference.md.
Contributing
While we value open-source contributions to this SDK, this library is generated programmatically. Additions made directly to this library would have to be moved over to our generation code, otherwise they would be overwritten upon the next generated release. Feel free to open a PR as a proof of concept, but know that we will not be able to merge it as-is. We suggest opening an issue first to discuss with us!
On the other hand, contributions to the README are always very welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file extend_ai-1.2.0.tar.gz.
File metadata
- Download URL: extend_ai-1.2.0.tar.gz
- Upload date:
- Size: 307.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.9.25 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
588e092bb47a65041060849c161d6924adfdb2f745252d72c1d2f38e53e92a66
|
|
| MD5 |
ccb19a2910bc02e5f8496eae7a314b9b
|
|
| BLAKE2b-256 |
7cdb7b400d94d64f6a78e9cb1679fa297c0cc1a6b06d9cee0743d7e417d1a5e3
|
File details
Details for the file extend_ai-1.2.0-py3-none-any.whl.
File metadata
- Download URL: extend_ai-1.2.0-py3-none-any.whl
- Upload date:
- Size: 739.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.9.25 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7aeadea2c7adb0d8dac5f883d70ed28ed12cbcf07de24d435a6c5be974ff165a
|
|
| MD5 |
65cbff1a41faab77b20f9516f886e5c1
|
|
| BLAKE2b-256 |
29db69b119cfb57962d1bcb5a9175ee30714c366b81d7f7399f9b0aa01b13e69
|