Skip to main content

Universal satellite data download pipeline with unified access to 20+ repositories

Project description

PyGeoFetch ๐Ÿ›ฐ๏ธ

Universal satellite data pipeline โ€” unified access to 22+ satellite repositories with one CLI or Python API.

PyPI version Python Versions License: MIT Tests Coverage


Why PyGeoFetch?

Feature PyGeoFetch EODAG pystac-client satpy sentinelsat
Providers 22+ 10+ STAC only Limited Sentinel only
CLI โœ… Full โŒ โŒ โŒ โœ… Basic
Pipeline Orchestration โœ… YAML โŒ โŒ โŒ โŒ
Auth Management โœ… Keyring Partial โŒ โŒ โœ…
Parallel Downloads โœ… Adaptive โœ… โŒ โŒ โŒ
STAC Output โœ… Native โŒ โœ… โŒ โŒ
GeoParquet โœ… โŒ โŒ โŒ โŒ
Docker โœ… โŒ โŒ โŒ โŒ
Scheduler โœ… Cron โŒ โŒ โŒ โŒ
Webhook Notifications โœ… โŒ โŒ โŒ โŒ
Commercial Providers โœ… Planet/Maxar โŒ โŒ โŒ โŒ

Installation

pip install PyGeoFetch

# With raster processing support
pip install "PyGeoFetch[geo]"

# Full installation (all optional deps)
pip install "PyGeoFetch[all]"

Requirements: Python 3.9+


Supported Providers

Provider ID Auth Satellites SAR <1m STAC
USGS Earth Explorer usgs ๐Ÿ” User/Pass Landsat 1-9, ASTER, MODIS โŒ โŒ โŒ
Copernicus CDSE copernicus ๐Ÿ” OAuth2 Sentinel-1/2/3/5P โœ… โŒ โœ…
NASA Earthdata CMR nasa_earthdata ๐Ÿ” OAuth2 MODIS, VIIRS, ICESat-2, GEDI โŒ โŒ โœ…
NASA Earthdata Cloud nasa_earthdata_cloud ๐Ÿ” OAuth2+S3 Cloud-hosted NASA data โŒ โŒ โœ…
OpenTopography opentopography ๐Ÿ” API Key SRTM, Copernicus DEM, LiDAR โŒ โŒ โŒ
Planet Labs planet ๐Ÿ” API Key PlanetScope, SkySat, RapidEye โŒ โœ… โœ…
Sentinel Hub sentinel_hub ๐Ÿ” OAuth2 Sentinel-1/2/3, Landsat, MODIS โœ… โŒ โŒ
Maxar GBDX maxar_gbdx ๐Ÿ” Token WorldView 1-4, GeoEye-1 โŒ โœ… โŒ
Airbus OneAtlas airbus_oneatlas ๐Ÿ” API Key Plรฉiades, SPOT 6/7 โŒ โœ… โœ…
Alaska Satellite Facility alaska_satellite_facility ๐Ÿ” Earthdata Sentinel-1, ALOS PALSAR โœ… โŒ โŒ
NOAA Big Data noaa_big_data ๐ŸŒ None GOES-16/17/18, NEXRAD โŒ โŒ โŒ
Google Earth Engine google_earth_engine ๐Ÿ” Service Acct Global multi-petabyte catalog โœ… โŒ โŒ
TerraBotics terrabotics ๐Ÿ” API Key Archive + Tasking โŒ โœ… โŒ
AWS Earth Open Data aws_earth ๐ŸŒ None Sentinel-2, Landsat, NAIP โŒ โŒ โœ…
Microsoft Planetary Computer planetary_computer ๐ŸŒ None Sentinel-1/2, Landsat, MODIS, NAIP โœ… โŒ โœ…
Element 84 Earth Search element84 ๐ŸŒ None Sentinel-2 COGs, Landsat Col 2 โŒ โŒ โœ…
ESA SciHub Mirror esa_scihub ๐ŸŒ None Copernicus public mirrors โœ… โŒ โŒ
JAXA ALOS World jaxa_earth ๐ŸŒ None ALOS World 3D DSM, PALSAR โœ… โŒ โŒ
ISRO Bhuvan isro_bhuvan ๐ŸŒ None ResourceSat, Cartosat, Oceansat โŒ โŒ โŒ
INPE CBERS inpe_cbers ๐ŸŒ None CBERS-4/4A โŒ โŒ โŒ
DigitalGlobe Open Data digitalglobe ๐ŸŒ None Disaster response imagery โŒ โœ… โŒ
GeoServer Generic geoserver_generic ๐ŸŒ Configurable Any OGC service โŒ โŒ โŒ

๐Ÿ” = Authentication Required | ๐ŸŒ = Open Access (No Login)


Quick Start (5 minutes)

1. Add credentials

PyGeoFetch auth add usgs --username USER --password PASS
PyGeoFetch auth add planet --api-key YOUR_KEY
PyGeoFetch auth login copernicus  # interactive

2. Search

PyGeoFetch search run \
    --bbox "-74.1,40.6,-73.7,40.9" \
    --start-date 2024-01-01 \
    --cloud-cover 0-15 \
    --providers planetary_computer \
    --output results.geojson

PyGeoFetch search demo

3. Download

PyGeoFetch download run \
    --from-search results.geojson \
    --output ./my_data/ \
    --parallel 4 \
    --verify-checksum \
    --post-process "unzip,reproject:EPSG:4326,compress:lzw"

PyGeoFetch download demo


Python API

from pathlib import Path
from PyGeoFetch import PyGeoFetch
from PyGeoFetch.models import SearchQuery, DownloadOptions

sb = PyGeoFetch()
sb.add_credentials("usgs", username="user", password="pass")
sb.add_credentials("planet", api_key="PL_KEY")

results = sb.search(
    SearchQuery(
        bbox=(-74.1, 40.6, -73.7, 40.9),
        start_date="2024-01-01",
        end_date="2024-06-01",
        cloud_cover_max=20,
    ),
    providers=["usgs", "copernicus", "planetary_computer", "aws_earth"],
)

print(f"Found {len(results)} scenes")

download_results = sb.download(
    results[:5],
    destination=Path("./data/"),
    options=DownloadOptions(
        parallel=4,
        verify_checksum=True,
        resume=True,
        post_process=[],
    ),
)

for dr in download_results:
    if dr.success:
        print(f"  โœ“ {dr.data_id} ({dr.bytes_downloaded // 1024 // 1024:.1f} MB)")
    else:
        print(f"  โœ— {dr.data_id}: {dr.error}")

CLI Reference

Global Options

PyGeoFetch [--log-level LEVEL] [--log-file FILE] [--log-format console|json]
                [--config FILE] [--version] [--help] COMMAND [ARGS]

Authentication

PyGeoFetch auth add PROVIDER --username USER --password PASS
PyGeoFetch auth add PROVIDER --api-key KEY
PyGeoFetch auth add PROVIDER --client-id ID --client-secret SECRET
PyGeoFetch auth login PROVIDER          # interactive
PyGeoFetch auth list [--json]
PyGeoFetch auth test PROVIDER
PyGeoFetch auth remove PROVIDER [--yes]
PyGeoFetch auth export [--output FILE]

Providers

PyGeoFetch providers list [--auth|--no-auth] [--capabilities sar,optical,stac] [--region global] [--satellite Landsat] [--json]
PyGeoFetch providers info PROVIDER [--json]
PyGeoFetch providers search "landsat" [--json]

Search

PyGeoFetch search run
    --bbox "minx,miny,maxx,maxy"       # Bounding box
    --geometry-file area.geojson        # AOI from GeoJSON file
    --start-date YYYY-MM-DD
    --end-date YYYY-MM-DD
    --cloud-cover MIN-MAX               # e.g. 0-20
    --resolution MIN-MAX                # metres e.g. 10-30
    --processing-level LEVEL            # e.g. L2A
    --providers P1,P2,...
    --satellites S1,S2,...
    --max-results N
    --sort-by datetime|cloud_cover|score|satellite
    --sort-order asc|desc
    --cql2 "EXPRESSION"                 # CQL2 filter expression
    --output FILE
    --format table|json|stac|geojson|geoparquet|csv|ids
    --on-provider-failure skip|abort|retry
    --timeout SECONDS
    --no-cache

Download

PyGeoFetch download run
    --from-search FILE                  # GeoJSON from search run
    --scene-ids ID1,ID2,...             # Direct scene IDs
    --output DIRECTORY
    --parallel N                        # Concurrent downloads
    --retry N                           # Retry attempts
    --verify-checksum                   # SHA256 verification
    --resume                            # Resume interrupted downloads
    --bandwidth-limit MB                # e.g. 10MB, 500KB
    --priority high|normal|low
    --notify webhook:URL                # Slack/Teams webhook
    --notify email:ADDRESS
    --post-process "ACTION1,ACTION2"    # Processing chain
    --on-failure skip|abort|retry
    --max-items N
    --overwrite

Cache

PyGeoFetch cache stats [--json]
PyGeoFetch cache clear [--provider PROVIDER] [--older-than 7d] [--dry-run]
PyGeoFetch cache ttl show
PyGeoFetch cache ttl set SECONDS
PyGeoFetch cache location
PyGeoFetch cache prune --max-size 1GB

Pipeline

PyGeoFetch pipeline run FILE [--step STEP_NAME]
PyGeoFetch pipeline validate FILE
PyGeoFetch pipeline schedule FILE [--name NAME] [--cron "0 6 * * 1"]
PyGeoFetch pipeline list-scheduled [--json]
PyGeoFetch pipeline unschedule NAME
PyGeoFetch pipeline logs NAME [--tail 50] [--follow]
PyGeoFetch pipeline history [--limit 20]
PyGeoFetch pipeline retry RUN_ID

Configuration

PyGeoFetch config show [--json]
PyGeoFetch config get KEY
PyGeoFetch config set KEY VALUE
PyGeoFetch config path
PyGeoFetch config reset

System

PyGeoFetch status [--json]
PyGeoFetch version [--json]
PyGeoFetch doctor                  # Diagnose installation and connectivity
PyGeoFetch --install-completion bash|zsh|fish

Output Formats

Format Flag Description
Table --format table Pretty-printed terminal table (default)
JSON --format json Full JSON response array
STAC --format stac STAC ItemCollection FeatureCollection
GeoJSON --format geojson GeoJSON FeatureCollection
GeoParquet --format geoparquet GeoParquet file (requires geopandas)
CSV --format csv CSV with id, provider, satellite, datetime, cloud_cover, score, bbox
IDs --format ids Scene IDs only, one per line

Post-Processing Actions

Chain post-processing actions on downloaded data:

PyGeoFetch download run \
    --from-search results.geojson --output ./data/ \
    --post-process "unzip,reproject:EPSG:4326,compress:lzw,ndvi,cog"
Action Syntax Description
unzip unzip Extract downloaded ZIP/TAR archives
reproject reproject:EPSG:4326 Reproject to target CRS
compress compress:lzw Apply compression (lzw, deflate, zstd)
ndvi ndvi Calculate NDVI from multispectral bands
ndwi ndwi Calculate NDWI water index
composite composite Create temporal composite
atmospheric atmospheric:sen2cor Atmospheric correction
clip clip:file.geojson Clip to geometry
resample resample:30 Resample to target resolution (metres)
cog cog Convert to Cloud Optimized GeoTIFF
merge merge Merge overlapping scenes
pan-sharpen pan-sharpen Pan-sharpen multispectral with panchromatic

Multiple actions execute in order: "unzip,reproject:EPSG:4326,compress:lzw,cog"


Pipeline Orchestration

Define recurring workflows in YAML:

# weekly-sentinel2.yaml
name: weekly-sentinel2-ndvi
schedule: "0 6 * * 1"  # Every Monday at 06:00 UTC
description: Weekly Sentinel-2 acquisition for NDVI monitoring

steps:
  - search:
      providers: [copernicus, aws_earth, planetary_computer]
      date_range: last_7_days
      cloud_cover: 0-10
      bbox: "-74.1,40.6,-73.7,40.9"
      max_results: 20

  - filter:
      expression: "data.cloud_cover < 5"

  - download:
      parallel: 4
      output: ./raw/
      verify_checksum: true

  - export:
      format: cloud_optimized_geotiff
      destination: s3://my-bucket/ndvi/
# One-shot execution
PyGeoFetch pipeline run weekly-sentinel2.yaml

# Schedule for recurring execution
PyGeoFetch pipeline schedule weekly-sentinel2.yaml

# Validate without running
PyGeoFetch pipeline validate weekly-sentinel2.yaml

Configuration

PyGeoFetch uses a hierarchical config system:

  1. Built-in defaults
  2. User config (~/.PyGeoFetch/config.yaml)
  3. Project config (.PyGeoFetch.yaml)
  4. Environment variables (SATELLITE_BRIDGE_*)
  5. CLI arguments

Full Configuration Reference

# ~/.PyGeoFetch/config.yaml

download:
  parallel: 4                        # Default concurrent downloads
  retry_attempts: 5                  # Max retries per file
  retry_delay_seconds: 1.0           # Initial delay (doubles each retry)
  retry_max_delay_seconds: 60.0      # Maximum delay cap
  retry_jitter: true                 # Add randomness to delays
  verify_checksum: false             # SHA256 verification
  checksum_algorithm: sha256         # md5, sha256, sha512
  chunk_size_mb: 10                  # Download chunk size
  resume: true                       # Auto-resume interrupted downloads
  bandwidth_limit_mbps: null         # Throttle (null = unlimited)
  overwrite_existing: false
  notify_on_completion: null         # webhook URL or email
  notify_on_failure: null
  on_failure: skip                   # skip, abort, retry

cache:
  enabled: true
  ttl_seconds: 3600                  # 1 hour default
  max_size_gb: 10                    # Auto-prune when exceeded
  location: ~/.PyGeoFetch/cache

search:
  default_providers: []
  max_results: 100
  timeout_seconds: 60
  on_provider_failure: skip          # skip, abort, retry
  sort_by: datetime
  sort_order: desc

auth:
  storage_backend: keyring           # keyring or file
  keyring_service: PyGeoFetch
  file_path: ~/.PyGeoFetch/credentials.enc

proxy:
  http_proxy: null
  https_proxy: null
  no_proxy: []

logging:
  level: INFO
  format: console                    # console, json
  file: null
  max_file_size_mb: 10
  backup_count: 3

providers:
  usgs:
    endpoint: https://m2m.cr.usgs.gov/api/api/json/stable/
    timeout: 60
  copernicus:
    endpoint: https://catalogue.dataspace.copernicus.eu/resto/api/
    timeout: 45
  planet:
    endpoint: https://api.planet.com/data/v1/
    rate_limit: 100
  planetary_computer:
    endpoint: https://planetarycomputer.microsoft.com/api/stac/v1
    timeout: 60
  element84:
    endpoint: https://earth-search.aws.element84.com/v1
    timeout: 60

Environment Variables

export SATELLITE_BRIDGE_LOG_LEVEL=DEBUG
export SATELLITE_BRIDGE_DOWNLOAD__PARALLEL=8
export SATELLITE_BRIDGE_CACHE__TTL_SECONDS=7200
export SATELLITE_BRIDGE_USGS_USERNAME=myuser
export SATELLITE_BRIDGE_USGS_PASSWORD=mypass
export SATELLITE_BRIDGE_PLANET_API_KEY=PL_KEY

Error Handling & Resilience

PyGeoFetch handles failures at every layer:

Provider Failures

  • Circuit breaker: Failing providers disabled after 5 consecutive failures
  • Automatic recovery: Providers retried after 60-second cooldown
  • Partial results: If one provider fails, results from others still returned
  • On-failure policy: skip, abort, or retry per search and download
# Skip failing providers, return what's available
PyGeoFetch search run --providers copernicus,usgs,planet --on-provider-failure skip

# Abort entirely if any provider fails
PyGeoFetch search run --providers copernicus,usgs --on-provider-failure abort

Download Resilience

  • Exponential backoff: Retries with 1s, 2s, 4s, 8s, 16s + jitter
  • Resume support: Interrupted downloads resume from last byte received
  • Checksum verification: SHA256 verified post-download, auto-retry on mismatch
  • Atomic writes: Files written to .tmp then renamed โ€” no partial files

Search Caching

  • Results cached per query (configurable TTL, default 1 hour)
  • Cache hits return instantly without API calls
  • Cache auto-invalidates; PyGeoFetch cache clear for manual purge

Logging

PyGeoFetch --log-level DEBUG search run ...
PyGeoFetch --log-file PyGeoFetch.log search run ...
PyGeoFetch --log-format json search run ...

Security

Credential Handling

  • Credentials are never logged โ€” log filters redact passwords, tokens, API keys
  • Authentication tokens stored in system keyring (macOS Keychain, Windows Credential Manager, Linux Secret Service)
  • Encrypted file fallback at ~/.PyGeoFetch/credentials.enc using Fernet symmetric encryption
  • Environment variables supported: SATELLITE_BRIDGE_USGS_USERNAME, SATELLITE_BRIDGE_PLANET_API_KEY, etc.
  • Credentials cleared from memory immediately after authentication

Network Security

  • TLS 1.2+ enforced on all connections
  • SSL certificate verification โ€” no verify=False anywhere in the codebase
  • Certificate pinning available for enterprise deployments
  • Proxy support respecting HTTP_PROXY/HTTPS_PROXY environment variables

Data Integrity

  • SHA256 checksum verification on all downloads (configurable: MD5, SHA256, SHA512)
  • Atomic file writes โ€” partial downloads never corrupt existing data
  • Download resume tokens prevent data duplication

Privacy

  • No telemetry โ€” PyGeoFetch does not phone home
  • No analytics โ€” zero data collection
  • No third-party requests beyond configured providers
  • Usage data stays local unless you configure webhook notifications

Reporting Vulnerabilities

Do not open public issues for security vulnerabilities.

  • Email: security@PyGeoFetch.dev
  • Response time: Within 48 hours
  • Responsible disclosure policy: 90-day disclosure window

Docker

Quick Start

docker pull PyGeoFetch/PyGeoFetch:latest

docker run -v ~/.PyGeoFetch:/root/.PyGeoFetch \
    -v $(pwd)/data:/data \
    PyGeoFetch/PyGeoFetch search run \
    --bbox "-74.1,40.6,-73.7,40.9" \
    --providers aws_earth \
    --output /data/results.geojson

Docker Compose for Scheduled Pipelines

version: '3.8'
services:
  PyGeoFetch-scheduler:
    image: PyGeoFetch/PyGeoFetch:latest
    volumes:
      - ~/.PyGeoFetch:/root/.PyGeoFetch
      - ./pipelines:/pipelines
      - ./data:/data
    command: PyGeoFetch pipeline run /pipelines/weekly-ndvi.yaml
    restart: unless-stopped

Build Locally

docker build -t PyGeoFetch:local .
docker run PyGeoFetch:local status

Available on Docker Hub and GitHub Container Registry.


Testing

# Run all tests
pytest tests/ -v

# With coverage
pytest tests/ -v --cov=PyGeoFetch --cov-report=html

# Unit tests only (fast)
pytest tests/unit/ -v

# Integration tests (requires credentials)
pytest tests/integration/ -v --run-integration

Testing Strategy

  • Unit tests: Every provider, utility, and model has dedicated unit tests
  • VCR recordings: HTTP interactions recorded with pytest-vcr for deterministic replay
  • Mock servers: Provider APIs simulated with responses and httpx mocks
  • Integration tests: Optional real-API tests flagged with --run-integration
  • Property-based testing: Edge cases generated with hypothesis
  • CLI tests: Full workflow tests with Click's CliRunner

Coverage minimum: 80% line coverage โ€” CI fails if below threshold.


Roadmap

v0.2.0 (Q4 2024)

  • BlackSky provider
  • SI Imaging Services (KOMPSAT) provider
  • Interactive search mode (--interactive)
  • Webhook integrations: Slack, Discord, Teams built-in templates
  • Streaming COG partial reads (no full download needed)

v0.3.0 (Q1 2025)

  • Web dashboard for pipeline monitoring (PyGeoFetch dashboard)
  • REST API server mode (PyGeoFetch serve)
  • Automatic provider health monitoring
  • Download bandwidth scheduling (limit during business hours)

v1.0.0 (Q2 2025)

  • PyGeoFetch Cloud (hosted API, zero setup)
  • Enterprise SSO (Okta, Azure AD)
  • Team workspaces for credential sharing
  • SLA-backed production support tier

Vote on features at github.com/PyGeoFetch/PyGeoFetch/discussions


Contributing

We welcome contributions! See CONTRIBUTING.md for full guidelines.

Good first issues: implementing stub providers to full API integrations, improving test coverage, adding new post-processing actions.

git clone https://github.com/appiahkubis14/PyGeoFetch
cd PyGeoFetch
pip install -e ".[dev,all]"
make test

License

MIT โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygeofetch-0.1.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygeofetch-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file pygeofetch-0.1.0.tar.gz.

File metadata

  • Download URL: pygeofetch-0.1.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pygeofetch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1b29cf6000eebce276cf7125fa33af6fcd70ed6a124c12b74ac7249911dbae52
MD5 68ea2810693f955a34a8afd67ea05e1b
BLAKE2b-256 ac7a69d1bc56079dfcab5490497e82337847ccfec8d2108b54bd30b40ae11eb0

See more details on using hashes here.

File details

Details for the file pygeofetch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pygeofetch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pygeofetch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c508aa25934f92d42754afd58b564645d89961e6ecc5609a62554fbaa4d1c495
MD5 d9a92fd02a7a72b9e9c089001a26d4f6
BLAKE2b-256 768e09429310340ff399d444113731e639a2659da55c732d05b5fde3bf913c29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page