Skip to main content

Official Kadoa SDK for Python - Web data extraction and automation

Project description

Kadoa SDK for Python

Official Python SDK for the Kadoa API, providing easy integration with Kadoa's web data extraction platform.

Installation

We recommend using uv, a fast and modern Python package manager:

uv add kadoa-sdk
# or
uv pip install kadoa-sdk

Alternatively, you can use traditional pip:

pip install kadoa-sdk

Requirements: Python 3.11 or higher

Quick Start

from kadoa_sdk import KadoaClient, KadoaClientConfig
from kadoa_sdk.extraction.types import ExtractionOptions

client = KadoaClient(
    KadoaClientConfig(
        api_key='your-api-key'
    )
)

# AI automatically detects and extracts data
result = client.extraction.run(
    ExtractionOptions(
        urls=['https://sandbox.kadoa.com/ecommerce'],
        name='My First Extraction'
    )
)

print(f"Extracted {len(result.data)} items")

That's it! With the SDK, data is automatically extracted. For more control, specify exactly what fields you want using the builder API.

Advanced Examples

Builder API with Custom Schema

Define exactly what fields to extract using the fluent builder API:

from kadoa_sdk import KadoaClient, KadoaClientConfig
from kadoa_sdk.extraction.types import ExtractOptions
from kadoa_sdk.schemas.schema_builder import SchemaBuilder, FieldOptions

client = KadoaClient(KadoaClientConfig(api_key='your-api-key'))

# Define custom schema
extraction = client.extract(
    ExtractOptions(
        urls=['https://example.com/products'],
        name='Product Extraction',
        extraction=lambda schema: (
            schema.entity('Product')
            .field('title', 'Product title', 'STRING')
            .field('price', 'Product price', 'MONEY', FieldOptions(example='$99.99'))
            .field('description', 'Product description', 'STRING')
            .field('image', 'Product image URL', 'IMAGE', FieldOptions(example='https://example.com/image.jpg'))
        )
    )
).create()

# Run and wait for completion
finished = extraction.run()
print(f"Extracted {len(finished.fetch_data().data)} products")

Notifications Setup

Configure notifications to be alerted when workflows complete:

from kadoa_sdk.notifications import NotificationOptions

extraction = client.extract(
    ExtractOptions(
        urls=['https://example.com'],
        name='Monitored Extraction'
    )
).with_notifications(
    NotificationOptions(
        events=['workflow_finished', 'workflow_failed'],
        channels={'email': True}
    )
).create()

finished = extraction.run()

Error Handling

Handle errors gracefully with proper exception types:

from kadoa_sdk import KadoaClient, KadoaClientConfig
from kadoa_sdk.core import KadoaSdkError, KadoaHttpError
from kadoa_sdk.extraction.types import ExtractionOptions

try:
    result = client.extraction.run(
        ExtractionOptions(
            urls=['https://example.com'],
            name='My Extraction'
        )
    )
except KadoaSdkError as e:
    print(f"SDK Error: {e.message}")
    print(f"Error Code: {e.code}")
    if e.details:
        print(f"Details: {e.details}")
except KadoaHttpError as e:
    print(f"HTTP Error: {e.message}")
    print(f"Status: {e.http_status}")
    print(f"Endpoint: {e.endpoint}")
except Exception as e:
    print(f"Unexpected error: {e}")

Paginated Data Fetching

Fetch data in pages for large datasets:

from kadoa_sdk.extraction.types import FetchDataOptions

# Fetch first page
result = client.extraction.fetch_data(
    FetchDataOptions(
        workflow_id='workflow-123',
        page=1,
        limit=50
    )
)

print(f"Page {result.pagination.page} of {result.pagination.total_pages}")
print(f"Total records: {result.pagination.total_count}")

# Fetch all data automatically
all_data = client.extraction.fetch_all_data(
    FetchDataOptions(workflow_id='workflow-123', limit=100)
)
print(f"Fetched {len(all_data)} total records")

Async Data Fetching

Process large datasets efficiently with async generators:

import asyncio
from kadoa_sdk.extraction.types import FetchDataOptions

async def process_all_pages():
    async for page in client.extraction.fetch_data_pages(
        FetchDataOptions(workflow_id='workflow-123', limit=100)
    ):
        print(f"Processing page {page.pagination.page}")
        for record in page.data:
            # Process each record
            process_record(record)

asyncio.run(process_all_pages())

Documentation

For comprehensive documentation, examples, and API reference, visit:

Requirements

  • Python 3.11 or higher
  • Dependencies are automatically installed

Support

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kadoa_sdk-0.8.0rc7.tar.gz (233.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kadoa_sdk-0.8.0rc7-py3-none-any.whl (776.1 kB view details)

Uploaded Python 3

File details

Details for the file kadoa_sdk-0.8.0rc7.tar.gz.

File metadata

  • Download URL: kadoa_sdk-0.8.0rc7.tar.gz
  • Upload date:
  • Size: 233.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for kadoa_sdk-0.8.0rc7.tar.gz
Algorithm Hash digest
SHA256 5d94c8eb3e062ff94df6edc660c9da1a96c61ff80085af737ee201003bce4de0
MD5 2a44dce4cf0bbcba33257d38a6061701
BLAKE2b-256 2db8a74a2bb79ed27f2633697a41e44478127d95201520b0611d2aea7bdd3ebd

See more details on using hashes here.

File details

Details for the file kadoa_sdk-0.8.0rc7-py3-none-any.whl.

File metadata

File hashes

Hashes for kadoa_sdk-0.8.0rc7-py3-none-any.whl
Algorithm Hash digest
SHA256 2ba298db2c98de420936f4b453112535339895bcbf4d2621ae27f0a8c943d77b
MD5 35ad7a693849f8c7cf7947664bc0b56a
BLAKE2b-256 b83a55b7a119246b474d27b913d3a1a2b8798a2458aa0a3fb5a3b899e11996e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page