Universal satellite data download pipeline with unified access to 20+ repositories
Project description
PyGeoFetch ๐ฐ๏ธ
Universal satellite data pipeline โ unified access to 22+ satellite repositories with one CLI or Python API.
Why PyGeoFetch?
| Feature | PyGeoFetch | EODAG | pystac-client | satpy | sentinelsat |
|---|---|---|---|---|---|
| Providers | 22+ | 10+ | STAC only | Limited | Sentinel only |
| CLI | โ Full | โ | โ | โ | โ Basic |
| Pipeline Orchestration | โ YAML | โ | โ | โ | โ |
| Auth Management | โ Keyring | Partial | โ | โ | โ |
| Parallel Downloads | โ Adaptive | โ | โ | โ | โ |
| STAC Output | โ Native | โ | โ | โ | โ |
| GeoParquet | โ | โ | โ | โ | โ |
| Docker | โ | โ | โ | โ | โ |
| Scheduler | โ Cron | โ | โ | โ | โ |
| Webhook Notifications | โ | โ | โ | โ | โ |
| Commercial Providers | โ Planet/Maxar | โ | โ | โ | โ |
Installation
pip install PyGeoFetch
# With raster processing support
pip install "PyGeoFetch[geo]"
# Full installation (all optional deps)
pip install "PyGeoFetch[all]"
Requirements: Python 3.9+
Supported Providers
| Provider | ID | Auth | Satellites | SAR | <1m | STAC |
|---|---|---|---|---|---|---|
| USGS Earth Explorer | usgs |
๐ User/Pass | Landsat 1-9, ASTER, MODIS | โ | โ | โ |
| Copernicus CDSE | copernicus |
๐ OAuth2 | Sentinel-1/2/3/5P | โ | โ | โ |
| NASA Earthdata CMR | nasa_earthdata |
๐ OAuth2 | MODIS, VIIRS, ICESat-2, GEDI | โ | โ | โ |
| NASA Earthdata Cloud | nasa_earthdata_cloud |
๐ OAuth2+S3 | Cloud-hosted NASA data | โ | โ | โ |
| OpenTopography | opentopography |
๐ API Key | SRTM, Copernicus DEM, LiDAR | โ | โ | โ |
| Planet Labs | planet |
๐ API Key | PlanetScope, SkySat, RapidEye | โ | โ | โ |
| Sentinel Hub | sentinel_hub |
๐ OAuth2 | Sentinel-1/2/3, Landsat, MODIS | โ | โ | โ |
| Maxar GBDX | maxar_gbdx |
๐ Token | WorldView 1-4, GeoEye-1 | โ | โ | โ |
| Airbus OneAtlas | airbus_oneatlas |
๐ API Key | Plรฉiades, SPOT 6/7 | โ | โ | โ |
| Alaska Satellite Facility | alaska_satellite_facility |
๐ Earthdata | Sentinel-1, ALOS PALSAR | โ | โ | โ |
| NOAA Big Data | noaa_big_data |
๐ None | GOES-16/17/18, NEXRAD | โ | โ | โ |
| Google Earth Engine | google_earth_engine |
๐ Service Acct | Global multi-petabyte catalog | โ | โ | โ |
| TerraBotics | terrabotics |
๐ API Key | Archive + Tasking | โ | โ | โ |
| AWS Earth Open Data | aws_earth |
๐ None | Sentinel-2, Landsat, NAIP | โ | โ | โ |
| Microsoft Planetary Computer | planetary_computer |
๐ None | Sentinel-1/2, Landsat, MODIS, NAIP | โ | โ | โ |
| Element 84 Earth Search | element84 |
๐ None | Sentinel-2 COGs, Landsat Col 2 | โ | โ | โ |
| ESA SciHub Mirror | esa_scihub |
๐ None | Copernicus public mirrors | โ | โ | โ |
| JAXA ALOS World | jaxa_earth |
๐ None | ALOS World 3D DSM, PALSAR | โ | โ | โ |
| ISRO Bhuvan | isro_bhuvan |
๐ None | ResourceSat, Cartosat, Oceansat | โ | โ | โ |
| INPE CBERS | inpe_cbers |
๐ None | CBERS-4/4A | โ | โ | โ |
| DigitalGlobe Open Data | digitalglobe |
๐ None | Disaster response imagery | โ | โ | โ |
| GeoServer Generic | geoserver_generic |
๐ Configurable | Any OGC service | โ | โ | โ |
๐ = Authentication Required | ๐ = Open Access (No Login)
Quick Start (5 minutes)
1. Add credentials
PyGeoFetch auth add usgs --username USER --password PASS
PyGeoFetch auth add planet --api-key YOUR_KEY
PyGeoFetch auth login copernicus # interactive
2. Search
PyGeoFetch search run \
--bbox "-74.1,40.6,-73.7,40.9" \
--start-date 2024-01-01 \
--cloud-cover 0-15 \
--providers planetary_computer \
--output results.geojson
3. Download
PyGeoFetch download run \
--from-search results.geojson \
--output ./my_data/ \
--parallel 4 \
--verify-checksum \
--post-process "unzip,reproject:EPSG:4326,compress:lzw"
Python API
from pathlib import Path
from PyGeoFetch import PyGeoFetch
from PyGeoFetch.models import SearchQuery, DownloadOptions
sb = PyGeoFetch()
sb.add_credentials("usgs", username="user", password="pass")
sb.add_credentials("planet", api_key="PL_KEY")
results = sb.search(
SearchQuery(
bbox=(-74.1, 40.6, -73.7, 40.9),
start_date="2024-01-01",
end_date="2024-06-01",
cloud_cover_max=20,
),
providers=["usgs", "copernicus", "planetary_computer", "aws_earth"],
)
print(f"Found {len(results)} scenes")
download_results = sb.download(
results[:5],
destination=Path("./data/"),
options=DownloadOptions(
parallel=4,
verify_checksum=True,
resume=True,
post_process=[],
),
)
for dr in download_results:
if dr.success:
print(f" โ {dr.data_id} ({dr.bytes_downloaded // 1024 // 1024:.1f} MB)")
else:
print(f" โ {dr.data_id}: {dr.error}")
CLI Reference
Global Options
PyGeoFetch [--log-level LEVEL] [--log-file FILE] [--log-format console|json]
[--config FILE] [--version] [--help] COMMAND [ARGS]
Authentication
PyGeoFetch auth add PROVIDER --username USER --password PASS
PyGeoFetch auth add PROVIDER --api-key KEY
PyGeoFetch auth add PROVIDER --client-id ID --client-secret SECRET
PyGeoFetch auth login PROVIDER # interactive
PyGeoFetch auth list [--json]
PyGeoFetch auth test PROVIDER
PyGeoFetch auth remove PROVIDER [--yes]
PyGeoFetch auth export [--output FILE]
Providers
PyGeoFetch providers list [--auth|--no-auth] [--capabilities sar,optical,stac] [--region global] [--satellite Landsat] [--json]
PyGeoFetch providers info PROVIDER [--json]
PyGeoFetch providers search "landsat" [--json]
Search
PyGeoFetch search run
--bbox "minx,miny,maxx,maxy" # Bounding box
--geometry-file area.geojson # AOI from GeoJSON file
--start-date YYYY-MM-DD
--end-date YYYY-MM-DD
--cloud-cover MIN-MAX # e.g. 0-20
--resolution MIN-MAX # metres e.g. 10-30
--processing-level LEVEL # e.g. L2A
--providers P1,P2,...
--satellites S1,S2,...
--max-results N
--sort-by datetime|cloud_cover|score|satellite
--sort-order asc|desc
--cql2 "EXPRESSION" # CQL2 filter expression
--output FILE
--format table|json|stac|geojson|geoparquet|csv|ids
--on-provider-failure skip|abort|retry
--timeout SECONDS
--no-cache
Download
PyGeoFetch download run
--from-search FILE # GeoJSON from search run
--scene-ids ID1,ID2,... # Direct scene IDs
--output DIRECTORY
--parallel N # Concurrent downloads
--retry N # Retry attempts
--verify-checksum # SHA256 verification
--resume # Resume interrupted downloads
--bandwidth-limit MB # e.g. 10MB, 500KB
--priority high|normal|low
--notify webhook:URL # Slack/Teams webhook
--notify email:ADDRESS
--post-process "ACTION1,ACTION2" # Processing chain
--on-failure skip|abort|retry
--max-items N
--overwrite
Cache
PyGeoFetch cache stats [--json]
PyGeoFetch cache clear [--provider PROVIDER] [--older-than 7d] [--dry-run]
PyGeoFetch cache ttl show
PyGeoFetch cache ttl set SECONDS
PyGeoFetch cache location
PyGeoFetch cache prune --max-size 1GB
Pipeline
PyGeoFetch pipeline run FILE [--step STEP_NAME]
PyGeoFetch pipeline validate FILE
PyGeoFetch pipeline schedule FILE [--name NAME] [--cron "0 6 * * 1"]
PyGeoFetch pipeline list-scheduled [--json]
PyGeoFetch pipeline unschedule NAME
PyGeoFetch pipeline logs NAME [--tail 50] [--follow]
PyGeoFetch pipeline history [--limit 20]
PyGeoFetch pipeline retry RUN_ID
Configuration
PyGeoFetch config show [--json]
PyGeoFetch config get KEY
PyGeoFetch config set KEY VALUE
PyGeoFetch config path
PyGeoFetch config reset
System
PyGeoFetch status [--json]
PyGeoFetch version [--json]
PyGeoFetch doctor # Diagnose installation and connectivity
PyGeoFetch --install-completion bash|zsh|fish
Output Formats
| Format | Flag | Description |
|---|---|---|
| Table | --format table |
Pretty-printed terminal table (default) |
| JSON | --format json |
Full JSON response array |
| STAC | --format stac |
STAC ItemCollection FeatureCollection |
| GeoJSON | --format geojson |
GeoJSON FeatureCollection |
| GeoParquet | --format geoparquet |
GeoParquet file (requires geopandas) |
| CSV | --format csv |
CSV with id, provider, satellite, datetime, cloud_cover, score, bbox |
| IDs | --format ids |
Scene IDs only, one per line |
Post-Processing Actions
Chain post-processing actions on downloaded data:
PyGeoFetch download run \
--from-search results.geojson --output ./data/ \
--post-process "unzip,reproject:EPSG:4326,compress:lzw,ndvi,cog"
| Action | Syntax | Description |
|---|---|---|
unzip |
unzip |
Extract downloaded ZIP/TAR archives |
reproject |
reproject:EPSG:4326 |
Reproject to target CRS |
compress |
compress:lzw |
Apply compression (lzw, deflate, zstd) |
ndvi |
ndvi |
Calculate NDVI from multispectral bands |
ndwi |
ndwi |
Calculate NDWI water index |
composite |
composite |
Create temporal composite |
atmospheric |
atmospheric:sen2cor |
Atmospheric correction |
clip |
clip:file.geojson |
Clip to geometry |
resample |
resample:30 |
Resample to target resolution (metres) |
cog |
cog |
Convert to Cloud Optimized GeoTIFF |
merge |
merge |
Merge overlapping scenes |
pan-sharpen |
pan-sharpen |
Pan-sharpen multispectral with panchromatic |
Multiple actions execute in order: "unzip,reproject:EPSG:4326,compress:lzw,cog"
Pipeline Orchestration
Define recurring workflows in YAML:
# weekly-sentinel2.yaml
name: weekly-sentinel2-ndvi
schedule: "0 6 * * 1" # Every Monday at 06:00 UTC
description: Weekly Sentinel-2 acquisition for NDVI monitoring
steps:
- search:
providers: [copernicus, aws_earth, planetary_computer]
date_range: last_7_days
cloud_cover: 0-10
bbox: "-74.1,40.6,-73.7,40.9"
max_results: 20
- filter:
expression: "data.cloud_cover < 5"
- download:
parallel: 4
output: ./raw/
verify_checksum: true
- export:
format: cloud_optimized_geotiff
destination: s3://my-bucket/ndvi/
# One-shot execution
PyGeoFetch pipeline run weekly-sentinel2.yaml
# Schedule for recurring execution
PyGeoFetch pipeline schedule weekly-sentinel2.yaml
# Validate without running
PyGeoFetch pipeline validate weekly-sentinel2.yaml
Configuration
PyGeoFetch uses a hierarchical config system:
- Built-in defaults
- User config (
~/.PyGeoFetch/config.yaml) - Project config (
.PyGeoFetch.yaml) - Environment variables (
SATELLITE_BRIDGE_*) - CLI arguments
Full Configuration Reference
# ~/.PyGeoFetch/config.yaml
download:
parallel: 4 # Default concurrent downloads
retry_attempts: 5 # Max retries per file
retry_delay_seconds: 1.0 # Initial delay (doubles each retry)
retry_max_delay_seconds: 60.0 # Maximum delay cap
retry_jitter: true # Add randomness to delays
verify_checksum: false # SHA256 verification
checksum_algorithm: sha256 # md5, sha256, sha512
chunk_size_mb: 10 # Download chunk size
resume: true # Auto-resume interrupted downloads
bandwidth_limit_mbps: null # Throttle (null = unlimited)
overwrite_existing: false
notify_on_completion: null # webhook URL or email
notify_on_failure: null
on_failure: skip # skip, abort, retry
cache:
enabled: true
ttl_seconds: 3600 # 1 hour default
max_size_gb: 10 # Auto-prune when exceeded
location: ~/.PyGeoFetch/cache
search:
default_providers: []
max_results: 100
timeout_seconds: 60
on_provider_failure: skip # skip, abort, retry
sort_by: datetime
sort_order: desc
auth:
storage_backend: keyring # keyring or file
keyring_service: PyGeoFetch
file_path: ~/.PyGeoFetch/credentials.enc
proxy:
http_proxy: null
https_proxy: null
no_proxy: []
logging:
level: INFO
format: console # console, json
file: null
max_file_size_mb: 10
backup_count: 3
providers:
usgs:
endpoint: https://m2m.cr.usgs.gov/api/api/json/stable/
timeout: 60
copernicus:
endpoint: https://catalogue.dataspace.copernicus.eu/resto/api/
timeout: 45
planet:
endpoint: https://api.planet.com/data/v1/
rate_limit: 100
planetary_computer:
endpoint: https://planetarycomputer.microsoft.com/api/stac/v1
timeout: 60
element84:
endpoint: https://earth-search.aws.element84.com/v1
timeout: 60
Environment Variables
export SATELLITE_BRIDGE_LOG_LEVEL=DEBUG
export SATELLITE_BRIDGE_DOWNLOAD__PARALLEL=8
export SATELLITE_BRIDGE_CACHE__TTL_SECONDS=7200
export SATELLITE_BRIDGE_USGS_USERNAME=myuser
export SATELLITE_BRIDGE_USGS_PASSWORD=mypass
export SATELLITE_BRIDGE_PLANET_API_KEY=PL_KEY
Error Handling & Resilience
PyGeoFetch handles failures at every layer:
Provider Failures
- Circuit breaker: Failing providers disabled after 5 consecutive failures
- Automatic recovery: Providers retried after 60-second cooldown
- Partial results: If one provider fails, results from others still returned
- On-failure policy:
skip,abort, orretryper search and download
# Skip failing providers, return what's available
PyGeoFetch search run --providers copernicus,usgs,planet --on-provider-failure skip
# Abort entirely if any provider fails
PyGeoFetch search run --providers copernicus,usgs --on-provider-failure abort
Download Resilience
- Exponential backoff: Retries with 1s, 2s, 4s, 8s, 16s + jitter
- Resume support: Interrupted downloads resume from last byte received
- Checksum verification: SHA256 verified post-download, auto-retry on mismatch
- Atomic writes: Files written to
.tmpthen renamed โ no partial files
Search Caching
- Results cached per query (configurable TTL, default 1 hour)
- Cache hits return instantly without API calls
- Cache auto-invalidates;
PyGeoFetch cache clearfor manual purge
Logging
PyGeoFetch --log-level DEBUG search run ...
PyGeoFetch --log-file PyGeoFetch.log search run ...
PyGeoFetch --log-format json search run ...
Security
Credential Handling
- Credentials are never logged โ log filters redact passwords, tokens, API keys
- Authentication tokens stored in system keyring (macOS Keychain, Windows Credential Manager, Linux Secret Service)
- Encrypted file fallback at
~/.PyGeoFetch/credentials.encusing Fernet symmetric encryption - Environment variables supported:
SATELLITE_BRIDGE_USGS_USERNAME,SATELLITE_BRIDGE_PLANET_API_KEY, etc. - Credentials cleared from memory immediately after authentication
Network Security
- TLS 1.2+ enforced on all connections
- SSL certificate verification โ no
verify=Falseanywhere in the codebase - Certificate pinning available for enterprise deployments
- Proxy support respecting
HTTP_PROXY/HTTPS_PROXYenvironment variables
Data Integrity
- SHA256 checksum verification on all downloads (configurable: MD5, SHA256, SHA512)
- Atomic file writes โ partial downloads never corrupt existing data
- Download resume tokens prevent data duplication
Privacy
- No telemetry โ PyGeoFetch does not phone home
- No analytics โ zero data collection
- No third-party requests beyond configured providers
- Usage data stays local unless you configure webhook notifications
Reporting Vulnerabilities
Do not open public issues for security vulnerabilities.
- Email: security@PyGeoFetch.dev
- Response time: Within 48 hours
- Responsible disclosure policy: 90-day disclosure window
Docker
Quick Start
docker pull PyGeoFetch/PyGeoFetch:latest
docker run -v ~/.PyGeoFetch:/root/.PyGeoFetch \
-v $(pwd)/data:/data \
PyGeoFetch/PyGeoFetch search run \
--bbox "-74.1,40.6,-73.7,40.9" \
--providers aws_earth \
--output /data/results.geojson
Docker Compose for Scheduled Pipelines
version: '3.8'
services:
PyGeoFetch-scheduler:
image: PyGeoFetch/PyGeoFetch:latest
volumes:
- ~/.PyGeoFetch:/root/.PyGeoFetch
- ./pipelines:/pipelines
- ./data:/data
command: PyGeoFetch pipeline run /pipelines/weekly-ndvi.yaml
restart: unless-stopped
Build Locally
docker build -t PyGeoFetch:local .
docker run PyGeoFetch:local status
Available on Docker Hub and GitHub Container Registry.
Testing
# Run all tests
pytest tests/ -v
# With coverage
pytest tests/ -v --cov=PyGeoFetch --cov-report=html
# Unit tests only (fast)
pytest tests/unit/ -v
# Integration tests (requires credentials)
pytest tests/integration/ -v --run-integration
Testing Strategy
- Unit tests: Every provider, utility, and model has dedicated unit tests
- VCR recordings: HTTP interactions recorded with
pytest-vcrfor deterministic replay - Mock servers: Provider APIs simulated with
responsesandhttpxmocks - Integration tests: Optional real-API tests flagged with
--run-integration - Property-based testing: Edge cases generated with
hypothesis - CLI tests: Full workflow tests with Click's
CliRunner
Coverage minimum: 80% line coverage โ CI fails if below threshold.
Roadmap
v0.2.0 (Q4 2024)
- BlackSky provider
- SI Imaging Services (KOMPSAT) provider
- Interactive search mode (
--interactive) - Webhook integrations: Slack, Discord, Teams built-in templates
- Streaming COG partial reads (no full download needed)
v0.3.0 (Q1 2025)
- Web dashboard for pipeline monitoring (
PyGeoFetch dashboard) - REST API server mode (
PyGeoFetch serve) - Automatic provider health monitoring
- Download bandwidth scheduling (limit during business hours)
v1.0.0 (Q2 2025)
- PyGeoFetch Cloud (hosted API, zero setup)
- Enterprise SSO (Okta, Azure AD)
- Team workspaces for credential sharing
- SLA-backed production support tier
Vote on features at github.com/PyGeoFetch/PyGeoFetch/discussions
Contributing
We welcome contributions! See CONTRIBUTING.md for full guidelines.
Good first issues: implementing stub providers to full API integrations, improving test coverage, adding new post-processing actions.
git clone https://github.com/appiahkubis14/PyGeoFetch
cd PyGeoFetch
pip install -e ".[dev,all]"
make test
License
MIT โ see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pygeofetch-0.1.0.tar.gz.
File metadata
- Download URL: pygeofetch-0.1.0.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b29cf6000eebce276cf7125fa33af6fcd70ed6a124c12b74ac7249911dbae52
|
|
| MD5 |
68ea2810693f955a34a8afd67ea05e1b
|
|
| BLAKE2b-256 |
ac7a69d1bc56079dfcab5490497e82337847ccfec8d2108b54bd30b40ae11eb0
|
File details
Details for the file pygeofetch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pygeofetch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c508aa25934f92d42754afd58b564645d89961e6ecc5609a62554fbaa4d1c495
|
|
| MD5 |
d9a92fd02a7a72b9e9c089001a26d4f6
|
|
| BLAKE2b-256 |
768e09429310340ff399d444113731e639a2659da55c732d05b5fde3bf913c29
|