Python SDK for the Geonode Scraper API
Project description
Geonode Scraper SDK
Python SDK for the Geonode Scraper API. It supports synchronous and asynchronous content extraction, job polling, usage statistics, and service health checks.
Requirements
- Python 3.10+
Installation
pip install geonode-scraper-sdk
Configuration And Authentication
Create a client configuration with your API base URL and API key.
from geonode_scraper_sdk import Configuration
configuration = Configuration(
host="https://api.example.com",
api_key={"APIKeyHeader": "your-api-key"},
)
If you do not set host, the generated client defaults to http://localhost.
You normally do not need api_key_prefix for this API.
Quick Start
This example performs a synchronous extraction and prints the markdown result.
from geonode_scraper_sdk import (
ApiClient,
ApiException,
Configuration,
ExtractRequest,
ExtractionApi,
OutputFormat,
ProcessingMode,
)
configuration = Configuration(
host="https://api.example.com",
api_key={"APIKeyHeader": "your-api-key"},
)
with ApiClient(configuration) as api_client:
api = ExtractionApi(api_client)
try:
response = api.extract_v1_extract_post(
ExtractRequest(
url="https://example.com",
formats=[OutputFormat.MARKDOWN],
processing_mode=ProcessingMode.SYNC,
)
)
print(response.data.markdown)
print(response.tokens_charged)
except ApiException as exc:
print(exc.status)
print(exc.body)
Async Workflow
When processing_mode=ProcessingMode.ASYNC, the extract call returns an async
job response with a job ID and status URL.
from geonode_scraper_sdk import ApiClient, Configuration, ExtractRequest, ExtractionApi, ProcessingMode
configuration = Configuration(
host="https://api.example.com",
api_key={"APIKeyHeader": "your-api-key"},
)
with ApiClient(configuration) as api_client:
api = ExtractionApi(api_client)
submit = api.extract_v1_extract_post(
ExtractRequest(
url="https://example.com",
processing_mode=ProcessingMode.ASYNC,
)
)
job = api.get_job_result_v1_extract_job_id_get(submit.job_id)
print(job.status)
if job.data and job.data.markdown:
print(job.data.markdown)
Use get_job_result_v1_extract_job_id_get(job_id) to poll a single job, or
list_jobs_v1_extract_jobs_get(...) to inspect and filter job history.
Error Handling
Non-2xx responses raise ApiException or one of its subclasses.
The exception includes the HTTP status, response body, and any deserialized
error model in exc.data.
from geonode_scraper_sdk import ApiClient, ApiException, Configuration, ExtractionApi, ExtractRequest
configuration = Configuration(
host="https://api.example.com",
api_key={"APIKeyHeader": "your-api-key"},
)
with ApiClient(configuration) as api_client:
api = ExtractionApi(api_client)
try:
api.extract_v1_extract_post(ExtractRequest(url="https://example.com"))
except ApiException as exc:
print(exc.status)
print(exc.body)
print(exc.data)
Request Options
ExtractRequest supports the main extraction controls:
formats: output formats to return; defaults to[OutputFormat.HTML]render_js: use a headless browser for JavaScript-rendered pages; defaults toFalseprocessing_mode:ProcessingMode.SYNCorProcessingMode.ASYNC; defaults to syncproxy: optionalProxySettingsfor country and proxy type selectionheaders: optional request headers dictionary
Example with additional options:
from geonode_scraper_sdk import ExtractRequest, OutputFormat, ProcessingMode, ProxySettings, ProxyType
request = ExtractRequest(
url="https://example.com",
formats=[OutputFormat.HTML, OutputFormat.MARKDOWN],
render_js=True,
processing_mode=ProcessingMode.SYNC,
proxy=ProxySettings(country="US", type=ProxyType.RESIDENTIAL),
headers={"User-Agent": "geonode-scraper-sdk-demo"},
)
API Reference
ExtractionApi.extract_v1_extract_post(extract_request)ExtractionApi.get_job_result_v1_extract_job_id_get(job_id)ExtractionApi.list_jobs_v1_extract_jobs_get(job_id=None, url=None, status=None, output=None, start_date=None, end_date=None, page=None, page_size=None)StatisticsApi.get_statistics_v1_statistics_get(start_date=None, end_date=None)SystemApi.health_check_health_get()
Advanced Usage
Each generated API method also exposes:
*_with_http_info()to get the deserialized payload together with status and headers*_without_preload_content()to work with the raw HTTP response directly
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geonode_scraper_sdk-0.1.0.tar.gz.
File metadata
- Download URL: geonode_scraper_sdk-0.1.0.tar.gz
- Upload date:
- Size: 32.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b04e39ec3c04c5754f6fe15f1023c58ba6e41f8d7121af47d6743572289da79
|
|
| MD5 |
8c25730685262752ffa9342b099c2f91
|
|
| BLAKE2b-256 |
c85c81d0ec68cd8084f26ed49521d6b9442a0437b68dc0af1f504652ca173b65
|
Provenance
The following attestation bundles were made for geonode_scraper_sdk-0.1.0.tar.gz:
Publisher:
python-sdk-publish.yml on geonodecom/scraper-api-sdks
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geonode_scraper_sdk-0.1.0.tar.gz -
Subject digest:
8b04e39ec3c04c5754f6fe15f1023c58ba6e41f8d7121af47d6743572289da79 - Sigstore transparency entry: 1270753991
- Sigstore integration time:
-
Permalink:
geonodecom/scraper-api-sdks@b1cda306deeff17df9d21abef600f8712367f0f9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/geonodecom
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-sdk-publish.yml@b1cda306deeff17df9d21abef600f8712367f0f9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file geonode_scraper_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: geonode_scraper_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 62.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8ab23b24ef3d803d166cea8f55508ff6a382076fe99ee674df5936e13bd4aaa
|
|
| MD5 |
349a5194f48e1be50a389cb04dde3582
|
|
| BLAKE2b-256 |
40925520f7e35df02901018bacf87fbc12afd89610e67d109bbc85955a07cec2
|
Provenance
The following attestation bundles were made for geonode_scraper_sdk-0.1.0-py3-none-any.whl:
Publisher:
python-sdk-publish.yml on geonodecom/scraper-api-sdks
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geonode_scraper_sdk-0.1.0-py3-none-any.whl -
Subject digest:
b8ab23b24ef3d803d166cea8f55508ff6a382076fe99ee674df5936e13bd4aaa - Sigstore transparency entry: 1270754035
- Sigstore integration time:
-
Permalink:
geonodecom/scraper-api-sdks@b1cda306deeff17df9d21abef600f8712367f0f9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/geonodecom
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-sdk-publish.yml@b1cda306deeff17df9d21abef600f8712367f0f9 -
Trigger Event:
push
-
Statement type: