Skip to main content

Command-line interface for the Geonode Scraper API

Project description

Geonode Scraper CLI

gscraper is the command-line interface for the Geonode Scraper API. It is a thin presentation layer over geonode-scraper-tools-core: commands parse flags, resolve configuration, call a stable service method, and render the result. All domain logic (validation, polling, retries) lives in the service layer, not in the CLI.

Requirements

  • Python 3.10+
  • Works on Linux, macOS, and Windows

Installation

Recommended — install as a standalone tool with pipx:

pipx install geonode-scraper-cli

pipx installs gscraper into its own isolated virtual environment and puts it on your PATH, so it never conflicts with other Python projects. This is the preferred way to install CLI tools globally.

Alternative — install with pip:

pip install geonode-scraper-cli

Windows note: on Windows the gscraper command is placed in the Python Scripts folder (e.g. %APPDATA%\Python\Python3xx\Scripts). If the command is not found after installation, add that folder to your PATH, or use python -m geonode_scraper_cli as a fallback. pipx handles this automatically and is the simpler choice on Windows.

Configuration

Configuration is resolved with the following precedence (highest first):

  1. Command-line flags (--api-key, --host, ...)
  2. Environment variables (GEONODE_SCRAPER_API_KEY, GEONODE_SCRAPER_HOST, GEONODE_SCRAPER_VERIFY_SSL, GEONODE_SCRAPER_TIMEOUT, GEONODE_SCRAPER_PROFILE)
  3. A TOML config file at ~/.config/geonode-scraper/config.toml
  4. Built-in defaults

Prefer environment variables or the config file for your API key — passing --api-key on the command line can leak it into your shell history.

Example ~/.config/geonode-scraper/config.toml:

[default]
host = "https://api.example.com"
api_key = "your-api-key"
verify_ssl = true

[staging]
host = "https://staging.example.com"
api_key = "your-staging-key"

Select a non-default profile with --profile staging or GEONODE_SCRAPER_PROFILE=staging. Inspect the active configuration with:

gscraper config path     # print the config file location
gscraper config show     # show profiles (API keys masked)

Output

Commands print a human-readable summary by default. Use --json or --yaml to print the raw result envelope for scripting. These flags can appear either before the subcommand (global position) or after it (per-command position) — both work:

gscraper extract https://example.com --json | jq -r .result.data.markdown
gscraper --json extract https://example.com | jq -r .result.data.markdown

The JSON/YAML envelope has the shape { "ok": bool, "operation": str, "result": {...} } on success, or { "ok": false, "operation": str, "error": {...} } on failure.

Commands

gscraper extract URL [--format markdown|html] [--render-js] [--async] \
                     [--proxy-country US] [--proxy-type residential] \
                     [--header "K: V"] [--output out.md]

gscraper jobs get JOB_ID
gscraper jobs list [--status completed] [--url ...] [--page N]
gscraper jobs wait JOB_ID [--timeout S] [--interval S]

gscraper batch create URL [URL ...] [--format markdown]
gscraper batch status JOB_ID
gscraper batch wait JOB_ID [--timeout S] [--interval S]
gscraper batch list [--status ...]
gscraper batch cancel JOB_ID

gscraper crawl create URL [--depth 2] [--limit 50] [--include-subdomains]
gscraper crawl status JOB_ID
gscraper crawl wait JOB_ID
gscraper crawl list [--url ...]
gscraper crawl cancel JOB_ID

gscraper map run URL [--search term] [--no-subdomains]   # primary action
gscraper map jobs list                                   # inspect past map jobs
gscraper map jobs get JOB_ID

gscraper stats [--start-date ISO] [--end-date ISO]
gscraper health

Run gscraper --help or gscraper <command> --help for full details.

Exit codes

Code Meaning
0 Success
1 Generic error
2 Usage / invalid arguments
4 Authentication / authorization (401, 403)
5 Not found (404)
6 Validation error (422)
7 Network / connection error
8 Polling timeout (wait commands)

Shell completion

gscraper --install-completion

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geonode_scraper_cli-0.1.0.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geonode_scraper_cli-0.1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file geonode_scraper_cli-0.1.0.tar.gz.

File metadata

  • Download URL: geonode_scraper_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for geonode_scraper_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8b0b412ff0eff3d1dd3b36aca3d1bbed2e410de49d86b704acd1af34d8ea292b
MD5 8e49608560f886f8dcbdf5ffef221088
BLAKE2b-256 831d54d4d9fe13a86bbface8cec34b8127ad55f6b4b2006ab79c37bac07f81af

See more details on using hashes here.

Provenance

The following attestation bundles were made for geonode_scraper_cli-0.1.0.tar.gz:

Publisher: python-cli-publish.yml on geonodecom/scraper-api-sdks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file geonode_scraper_cli-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for geonode_scraper_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0922e44035bb43ad3221084895582164980730823726f9a4759b82c8de4b6b5c
MD5 b3056fe816f9e4f4811057c8709f6a53
BLAKE2b-256 00fe6c21743e47570779e8e22a4db09252a17d56a396cc1caeff5301d98bf61b

See more details on using hashes here.

Provenance

The following attestation bundles were made for geonode_scraper_cli-0.1.0-py3-none-any.whl:

Publisher: python-cli-publish.yml on geonodecom/scraper-api-sdks

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page