Skip to main content

Python SDK for the Geonode Scraper API

Project description

Geonode Scraper SDK

Python SDK for the Geonode Scraper API. It supports synchronous and asynchronous content extraction, job polling, usage statistics, and service health checks.

Requirements

  • Python 3.10+

Installation

pip install geonode-scraper-sdk

Configuration And Authentication

Create a client configuration with your API base URL and API key.

from geonode_scraper_sdk import Configuration

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

If you do not set host, the generated client defaults to http://localhost. You normally do not need api_key_prefix for this API.

Quick Start

This example performs a synchronous extraction and prints the markdown result.

from geonode_scraper_sdk import (
    ApiClient,
    ApiException,
    Configuration,
    ExtractRequest,
    ExtractionApi,
    OutputFormat,
    ProcessingMode,
)

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    try:
        response = api.extract_v1_extract_post(
            ExtractRequest(
                url="https://example.com",
                formats=[OutputFormat.MARKDOWN],
                processing_mode=ProcessingMode.SYNC,
            )
        )
        print(response.data.markdown)
        print(response.tokens_charged)
    except ApiException as exc:
        print(exc.status)
        print(exc.body)

Async Workflow

When processing_mode=ProcessingMode.ASYNC, the extract call returns an async job response with a job ID and status URL.

from geonode_scraper_sdk import ApiClient, Configuration, ExtractRequest, ExtractionApi, ProcessingMode

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    submit = api.extract_v1_extract_post(
        ExtractRequest(
            url="https://example.com",
            processing_mode=ProcessingMode.ASYNC,
        )
    )

    job = api.get_job_result_v1_extract_job_id_get(submit.job_id)
    print(job.status)
    if job.data and job.data.markdown:
        print(job.data.markdown)

Use get_job_result_v1_extract_job_id_get(job_id) to poll a single job, or list_jobs_v1_extract_jobs_get(...) to inspect and filter job history.

Error Handling

Non-2xx responses raise ApiException or one of its subclasses. The exception includes the HTTP status, response body, and any deserialized error model in exc.data.

from geonode_scraper_sdk import ApiClient, ApiException, Configuration, ExtractionApi, ExtractRequest

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    try:
        api.extract_v1_extract_post(ExtractRequest(url="https://example.com"))
    except ApiException as exc:
        print(exc.status)
        print(exc.body)
        print(exc.data)

Request Options

ExtractRequest supports the main extraction controls:

  • formats: output formats to return; defaults to [OutputFormat.HTML]
  • render_js: use a headless browser for JavaScript-rendered pages; defaults to False
  • processing_mode: ProcessingMode.SYNC or ProcessingMode.ASYNC; defaults to sync
  • proxy: optional ProxySettings for country and proxy type selection
  • headers: optional request headers dictionary

Example with additional options:

from geonode_scraper_sdk import ExtractRequest, OutputFormat, ProcessingMode, ProxySettings, ProxyType

request = ExtractRequest(
    url="https://example.com",
    formats=[OutputFormat.HTML, OutputFormat.MARKDOWN],
    render_js=True,
    processing_mode=ProcessingMode.SYNC,
    proxy=ProxySettings(country="US", type=ProxyType.RESIDENTIAL),
    headers={"User-Agent": "geonode-scraper-sdk-demo"},
)

API Reference

  • ExtractionApi.extract_v1_extract_post(extract_request)
  • ExtractionApi.get_job_result_v1_extract_job_id_get(job_id)
  • ExtractionApi.list_jobs_v1_extract_jobs_get(job_id=None, url=None, status=None, output=None, start_date=None, end_date=None, page=None, page_size=None)
  • StatisticsApi.get_statistics_v1_statistics_get(start_date=None, end_date=None)
  • SystemApi.health_check_health_get()

Advanced Usage

Each generated API method also exposes:

  • *_with_http_info() to get the deserialized payload together with status and headers
  • *_without_preload_content() to work with the raw HTTP response directly

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geonode_scraper_sdk-0.1.0.tar.gz (32.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geonode_scraper_sdk-0.1.0-py3-none-any.whl (62.7 kB view details)

Uploaded Python 3

File details

Details for the file geonode_scraper_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: geonode_scraper_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 32.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for geonode_scraper_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8b04e39ec3c04c5754f6fe15f1023c58ba6e41f8d7121af47d6743572289da79
MD5 8c25730685262752ffa9342b099c2f91
BLAKE2b-256 c85c81d0ec68cd8084f26ed49521d6b9442a0437b68dc0af1f504652ca173b65

See more details on using hashes here.

Provenance

The following attestation bundles were made for geonode_scraper_sdk-0.1.0.tar.gz:

Publisher: python-sdk-publish.yml on geonodecom/scraper-api-sdks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file geonode_scraper_sdk-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for geonode_scraper_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8ab23b24ef3d803d166cea8f55508ff6a382076fe99ee674df5936e13bd4aaa
MD5 349a5194f48e1be50a389cb04dde3582
BLAKE2b-256 40925520f7e35df02901018bacf87fbc12afd89610e67d109bbc85955a07cec2

See more details on using hashes here.

Provenance

The following attestation bundles were made for geonode_scraper_sdk-0.1.0-py3-none-any.whl:

Publisher: python-sdk-publish.yml on geonodecom/scraper-api-sdks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page