Python SDK for the Geonode Scraper API

These details have not been verified by PyPI

Project description

Geonode Scraper SDK

Python SDK for the Geonode Scraper API. Supports single-URL extraction, batch extraction, site crawling, URL mapping, job polling, and usage statistics.

Requirements

Python 3.10+

Installation

pip install geonode-scraper-sdk

Configuration And Authentication

Create a client configuration with your API base URL and API key.

from geonode_scraper_sdk import Configuration

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

If you do not set host, the generated client defaults to http://localhost.

Quick Start

Synchronous extraction — blocks until the result is ready.

from geonode_scraper_sdk import (
    ApiClient,
    ApiException,
    Configuration,
    ExtractRequest,
    ExtractionApi,
    OutputFormat,
    ProcessingMode,
)

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    try:
        response = api.extract_v1_extract_post(
            ExtractRequest(
                url="https://example.com",
                formats=[OutputFormat.MARKDOWN],
                processing_mode=ProcessingMode.SYNC,
            )
        )
        print(response.data.markdown)
        print(response.tokens_charged)
    except ApiException as exc:
        print(exc.status)
        print(exc.body)

Async Extraction Workflow

When processing_mode=ProcessingMode.ASYNC, the extract call returns an async job response with a job ID and status URL.

from geonode_scraper_sdk import ApiClient, Configuration, ExtractRequest, ExtractionApi, ProcessingMode

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    submit = api.extract_v1_extract_post(
        ExtractRequest(
            url="https://example.com",
            processing_mode=ProcessingMode.ASYNC,
        )
    )

    job = api.get_job_result_v1_extract_job_id_get(submit.job_id)
    print(job.status)
    if job.data and job.data.markdown:
        print(job.data.markdown)

Use get_job_result_v1_extract_job_id_get(job_id) to poll a single job, or list_jobs_v1_extract_jobs_get(...) to inspect and filter job history.

Batch Extraction

Submit multiple URLs in one request and poll for results.

from geonode_scraper_sdk import ApiClient, BatchApi, BatchRequest, Configuration, OutputFormat

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = BatchApi(api_client)

    accepted = api.create_batch_v1_batch_post(
        BatchRequest(
            urls=["https://example.com", "https://example.org"],
            formats=[OutputFormat.MARKDOWN],
        )
    )
    print(accepted.job_id, accepted.accepted_urls)

    status = api.get_batch_status_v1_batch_job_id_get(
        job_id=accepted.job_id, page=1, page_size=10
    )
    print(status.status, status.completed_urls, status.total_urls)

Site Crawling

Crawl a website from a seed URL up to a configurable depth and page limit.

from geonode_scraper_sdk import ApiClient, Configuration, CrawlApi, CrawlRequest, OutputFormat

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = CrawlApi(api_client)

    accepted = api.create_crawl_v1_crawl_post(
        CrawlRequest(
            url="https://example.com",
            depth=2,
            limit=50,
            formats=[OutputFormat.MARKDOWN],
        )
    )
    print(accepted.job_id, accepted.estimated_pages)

    status = api.get_crawl_status_v1_crawl_job_id_get(
        job_id=accepted.job_id, page=1, page_size=10
    )
    print(status.status, status.completed_pages, status.total_pages)

URL Mapping

Discover all URLs under a base URL by combining sitemap parsing with HTML link extraction. Returns synchronously.

from geonode_scraper_sdk import ApiClient, Configuration, MapApi, MapRequest

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = MapApi(api_client)

    result = api.map_urls_v1_map_post(MapRequest(url="https://example.com"))
    for link in result.links:
        print(link.url, link.source)

Error Handling

Non-2xx responses raise ApiException or one of its subclasses. The exception includes the HTTP status, response body, and any deserialized error model in exc.data.

from geonode_scraper_sdk import ApiClient, ApiException, Configuration, ExtractionApi, ExtractRequest

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    try:
        api.extract_v1_extract_post(ExtractRequest(url="https://example.com"))
    except ApiException as exc:
        print(exc.status)
        print(exc.body)
        print(exc.data)

Request Options

ExtractRequest supports the following fields:

formats: output formats to return; defaults to [OutputFormat.HTML]
render_js: use a headless browser for JavaScript-rendered pages; defaults to False
processing_mode: ProcessingMode.SYNC or ProcessingMode.ASYNC; defaults to sync
extract_links: extract all links found on the page; defaults to False
proxy: optional ProxySettings for country and proxy type selection
headers: optional request headers dictionary
wait_config: optional WaitConfig for explicit browser wait policy (wait_until, wait_for, wait_timeout)

Example with additional options:

from geonode_scraper_sdk import ExtractRequest, OutputFormat, ProcessingMode, ProxySettings, ProxyType, WaitConfig, WaitUntil

request = ExtractRequest(
    url="https://example.com",
    formats=[OutputFormat.HTML, OutputFormat.MARKDOWN],
    render_js=True,
    processing_mode=ProcessingMode.SYNC,
    extract_links=True,
    proxy=ProxySettings(country="US", type=ProxyType.RESIDENTIAL),
    headers={"User-Agent": "geonode-scraper-sdk-demo"},
    wait_config=WaitConfig(
        wait_until=WaitUntil.NETWORKIDLE,
        wait_for="#content",
        wait_timeout=2000,
    ),
)

API Reference

ExtractionApi (/v1/extract)

extract_v1_extract_post(extract_request)
get_job_result_v1_extract_job_id_get(job_id)
list_jobs_v1_extract_jobs_get(job_id, url, status, output, start_date, end_date, page, page_size)

BatchApi (/v1/batch)

create_batch_v1_batch_post(batch_request)
get_batch_status_v1_batch_job_id_get(job_id, page, page_size)
cancel_batch_v1_batch_job_id_delete(job_id)

CrawlApi (/v1/crawl)

create_crawl_v1_crawl_post(crawl_request)
get_crawl_status_v1_crawl_job_id_get(job_id, page, page_size)
cancel_crawl_v1_crawl_job_id_delete(job_id)

MapApi (/v1/map)

map_urls_v1_map_post(map_request)

StatisticsApi (/v1/statistics)

get_statistics_v1_statistics_get(start_date, end_date)

SystemApi (/health)

health_check_health_get()

WebhooksApi (/v1/webhooks)

list_webhooks_v1_webhooks_get(page, page_size)
create_webhook_v1_webhooks_post(webhook_create)
get_webhook_v1_webhooks_webhook_id_get(webhook_id)
update_webhook_v1_webhooks_webhook_id_patch(webhook_id, webhook_update)
delete_webhook_v1_webhooks_webhook_id_delete(webhook_id)
list_deliveries_v1_webhooks_webhook_id_deliveries_get(webhook_id, page, page_size, status)
rotate_secret_v1_webhooks_webhook_id_rotate_secret_post(webhook_id)

Advanced Usage

Each generated API method also exposes:

*_with_http_info() to get the deserialized payload together with status and headers
*_without_preload_content() to work with the raw HTTP response directly

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

Jun 12, 2026

This version

0.2.0

Jun 5, 2026

0.1.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geonode_scraper_sdk-0.2.0.tar.gz (49.4 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

geonode_scraper_sdk-0.2.0-py3-none-any.whl (122.9 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file geonode_scraper_sdk-0.2.0.tar.gz.

File metadata

Download URL: geonode_scraper_sdk-0.2.0.tar.gz
Upload date: Jun 5, 2026
Size: 49.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for geonode_scraper_sdk-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4c5ddf38d341b879f6154a6d3ab25fe71b71c94d09b61c169dbfec2d96c5e5e6`
MD5	`984010a40e9144f6f382812239be1d31`
BLAKE2b-256	`a5ec16355b64354934147f7ac9290ef68ed7d8db9efffbb5a44c071e8aca2f81`

See more details on using hashes here.

Provenance

The following attestation bundles were made for geonode_scraper_sdk-0.2.0.tar.gz:

Publisher: python-sdk-publish.yml on geonodecom/scraper-api-sdks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: geonode_scraper_sdk-0.2.0.tar.gz
- Subject digest: 4c5ddf38d341b879f6154a6d3ab25fe71b71c94d09b61c169dbfec2d96c5e5e6
- Sigstore transparency entry: 1732265869
- Sigstore integration time: Jun 5, 2026
Source repository:
- Permalink: geonodecom/scraper-api-sdks@98a8eb610cd6a525fccb7dfecc6a768463f03e83
- Branch / Tag: refs/heads/main
- Owner: https://github.com/geonodecom
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-sdk-publish.yml@98a8eb610cd6a525fccb7dfecc6a768463f03e83
- Trigger Event: workflow_dispatch

File details

Details for the file geonode_scraper_sdk-0.2.0-py3-none-any.whl.

File metadata

Download URL: geonode_scraper_sdk-0.2.0-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 122.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for geonode_scraper_sdk-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07349e4f7f623e8c3e0dadf875686b8291f57b556daae5b5db84f8e7c3bb2248`
MD5	`d0f701b7708eb925931fd9b6f619e4b8`
BLAKE2b-256	`5e4efe36ce9bb86790eeb7994a79e767a23a6dbb6753a3cd014fc777623db341`

See more details on using hashes here.

Provenance

The following attestation bundles were made for geonode_scraper_sdk-0.2.0-py3-none-any.whl:

Publisher: python-sdk-publish.yml on geonodecom/scraper-api-sdks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: geonode_scraper_sdk-0.2.0-py3-none-any.whl
- Subject digest: 07349e4f7f623e8c3e0dadf875686b8291f57b556daae5b5db84f8e7c3bb2248
- Sigstore transparency entry: 1732265899
- Sigstore integration time: Jun 5, 2026
Source repository:
- Permalink: geonodecom/scraper-api-sdks@98a8eb610cd6a525fccb7dfecc6a768463f03e83
- Branch / Tag: refs/heads/main
- Owner: https://github.com/geonodecom
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-sdk-publish.yml@98a8eb610cd6a525fccb7dfecc6a768463f03e83
- Trigger Event: workflow_dispatch

geonode-scraper-sdk 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Geonode Scraper SDK

Requirements

Installation

Configuration And Authentication

Quick Start

Async Extraction Workflow

Batch Extraction

Site Crawling

URL Mapping

Error Handling

Request Options

API Reference

Advanced Usage

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance