Command-line interface for the Geonode Scraper API
Project description
Geonode Scraper CLI
gscraper is the command-line interface for the Geonode Scraper API. It is a thin
presentation layer over geonode-scraper-tools-core:
commands parse flags, resolve configuration, call a stable service method, and
render the result. All domain logic (validation, polling, retries) lives in the
service layer, not in the CLI.
Requirements
- Python 3.10+
- Works on Linux, macOS, and Windows
Installation
Recommended — install as a standalone tool with pipx:
pipx install geonode-scraper-cli
pipx installs gscraper into its own isolated virtual environment and puts it
on your PATH, so it never conflicts with other Python projects. This is the
preferred way to install CLI tools globally.
Alternative — install with pip:
pip install geonode-scraper-cli
Windows note: on Windows the
gscrapercommand is placed in the Python Scripts folder (e.g.%APPDATA%\Python\Python3xx\Scripts). If the command is not found after installation, add that folder to yourPATH, or usepython -m geonode_scraper_clias a fallback.pipxhandles this automatically and is the simpler choice on Windows.
Configuration
Configuration is resolved with the following precedence (highest first):
- Command-line flags (
--api-key,--host, ...) - Environment variables (
GEONODE_SCRAPER_API_KEY,GEONODE_SCRAPER_HOST,GEONODE_SCRAPER_VERIFY_SSL,GEONODE_SCRAPER_TIMEOUT,GEONODE_SCRAPER_PROFILE) - A TOML config file at
~/.config/geonode-scraper/config.toml - Built-in defaults
Prefer environment variables or the config file for your API key — passing
--api-key on the command line can leak it into your shell history.
Example ~/.config/geonode-scraper/config.toml:
[default]
host = "https://api.example.com"
api_key = "your-api-key"
verify_ssl = true
[staging]
host = "https://staging.example.com"
api_key = "your-staging-key"
Select a non-default profile with --profile staging or
GEONODE_SCRAPER_PROFILE=staging. Inspect the active configuration with:
gscraper config path # print the config file location
gscraper config show # show profiles (API keys masked)
Output
Commands print a human-readable summary by default. Use --json or --yaml
to print the raw result envelope for scripting. These flags can appear either
before the subcommand (global position) or after it (per-command position) —
both work:
gscraper extract https://example.com --json | jq -r .result.data.markdown
gscraper --json extract https://example.com | jq -r .result.data.markdown
The JSON/YAML envelope has the shape { "ok": bool, "operation": str, "result": {...} }
on success, or { "ok": false, "operation": str, "error": {...} } on failure.
Commands
gscraper extract URL [--format markdown|html] [--render-js] [--async] \
[--proxy-country US] [--proxy-type residential] \
[--header "K: V"] [--output out.md]
gscraper jobs get JOB_ID
gscraper jobs list [--status completed] [--url ...] [--page N]
gscraper jobs wait JOB_ID [--timeout S] [--interval S]
gscraper batch create URL [URL ...] [--format markdown]
gscraper batch status JOB_ID
gscraper batch wait JOB_ID [--timeout S] [--interval S]
gscraper batch list [--status ...]
gscraper batch cancel JOB_ID
gscraper crawl create URL [--depth 2] [--limit 50] [--include-subdomains]
gscraper crawl status JOB_ID
gscraper crawl wait JOB_ID
gscraper crawl list [--url ...]
gscraper crawl cancel JOB_ID
gscraper map run URL [--search term] [--no-subdomains] # primary action
gscraper map jobs list # inspect past map jobs
gscraper map jobs get JOB_ID
gscraper stats [--start-date ISO] [--end-date ISO]
gscraper health
Run gscraper --help or gscraper <command> --help for full details.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Generic error |
| 2 | Usage / invalid arguments |
| 4 | Authentication / authorization (401, 403) |
| 5 | Not found (404) |
| 6 | Validation error (422) |
| 7 | Network / connection error |
| 8 | Polling timeout (wait commands) |
Shell completion
gscraper --install-completion
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geonode_scraper_cli-0.1.0.tar.gz.
File metadata
- Download URL: geonode_scraper_cli-0.1.0.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b0b412ff0eff3d1dd3b36aca3d1bbed2e410de49d86b704acd1af34d8ea292b
|
|
| MD5 |
8e49608560f886f8dcbdf5ffef221088
|
|
| BLAKE2b-256 |
831d54d4d9fe13a86bbface8cec34b8127ad55f6b4b2006ab79c37bac07f81af
|
Provenance
The following attestation bundles were made for geonode_scraper_cli-0.1.0.tar.gz:
Publisher:
python-cli-publish.yml on geonodecom/scraper-api-sdks
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geonode_scraper_cli-0.1.0.tar.gz -
Subject digest:
8b0b412ff0eff3d1dd3b36aca3d1bbed2e410de49d86b704acd1af34d8ea292b - Sigstore transparency entry: 2009014702
- Sigstore integration time:
-
Permalink:
geonodecom/scraper-api-sdks@456f82e8ae78cb1d834f816324a0c30614f8842b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/geonodecom
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-cli-publish.yml@456f82e8ae78cb1d834f816324a0c30614f8842b -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file geonode_scraper_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: geonode_scraper_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0922e44035bb43ad3221084895582164980730823726f9a4759b82c8de4b6b5c
|
|
| MD5 |
b3056fe816f9e4f4811057c8709f6a53
|
|
| BLAKE2b-256 |
00fe6c21743e47570779e8e22a4db09252a17d56a396cc1caeff5301d98bf61b
|
Provenance
The following attestation bundles were made for geonode_scraper_cli-0.1.0-py3-none-any.whl:
Publisher:
python-cli-publish.yml on geonodecom/scraper-api-sdks
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geonode_scraper_cli-0.1.0-py3-none-any.whl -
Subject digest:
0922e44035bb43ad3221084895582164980730823726f9a4759b82c8de4b6b5c - Sigstore transparency entry: 2009014813
- Sigstore integration time:
-
Permalink:
geonodecom/scraper-api-sdks@456f82e8ae78cb1d834f816324a0c30614f8842b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/geonodecom
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-cli-publish.yml@456f82e8ae78cb1d834f816324a0c30614f8842b -
Trigger Event:
workflow_dispatch
-
Statement type: