Skip to main content

Cloud-native geospatial developer toolkit

Project description

EarthForge

EarthForge Banner

License: GPL v3 Python 3.11+ CI PyPI Hatch

Working with cloud-native geospatial data means juggling gdalinfo for COGs, stac-client for discovery, geopandas for GeoParquet, xarray for Zarr, and a collection of one-off scripts to glue them together. Each tool has its own CLI conventions, its own output format, and its own assumptions about how you authenticate to cloud storage.

EarthForge is a single composable toolkit that unifies these workflows. One CLI. One config system. One output contract. Every command works locally, against S3, GCS, or Azure — and every command produces both human-readable tables and machine-parseable JSON.

# Inspect any cloud-native geospatial file — format auto-detected
earthforge info s3://bucket/image.tif
earthforge info buildings.parquet
earthforge info climate.zarr

# Search STAC catalogs
earthforge stac search sentinel-2-l2a --bbox -85,37,-84,38 --datetime 2025-06/2025-09

# Generate a quicklook preview from a remote COG without downloading it
earthforge raster preview s3://bucket/scene.tif -o preview.png

# Convert legacy formats to cloud-native
earthforge vector convert buildings.shp --to geoparquet
earthforge raster convert image.tif --to cog

# Query GeoParquet with spatial predicate pushdown
earthforge vector query buildings.parquet --bbox -85,37,-84,38

# Inspect and slice Zarr datacubes
earthforge cube info s3://era5-pds/zarr/2025/01/data/air_temperature_at_2_metres.zarr
earthforge cube slice s3://era5-pds/zarr/ --var t2m --bbox -85,37,-84,38 --time 2025-06/2025-06 -o ky_june.zarr

# Pipe structured JSON into other tools
earthforge stac search sentinel-2-l2a -o json | jq '.items[].assets.B04.href'

What EarthForge Is

EarthForge is a library-first, CLI-first developer toolkit. Install it as a Python library and call functions directly, or use the CLI from shell scripts and pipelines. Every CLI command is a thin wrapper around a library function, so anything you can do from the terminal you can also do from Python, a Jupyter notebook, or a pipeline runner.

from earthforge.raster.info import inspect_raster
from earthforge.stac.search import search_catalog

# Library usage — same logic as the CLI, no subprocess needed
items = await search_catalog("sentinel-2-l2a", bbox=(-85, 37, -84, 38))
metadata = await inspect_raster("s3://bucket/scene.tif")

Real-World Output

The samples below are actual outputs from EarthForge commands run against public geospatial data. Sample files live in data/samples/.

KyFromAbove 3-inch Orthoimagery — fetched thumbnail

earthforge stac fetch \
  https://spved5ihrl.execute-api.us-west-2.amazonaws.com/collections/orthos-phase3/items/N097E305_2024_Season1_3IN_cog \
  --assets thumbnail --output-dir data/kyfromabove_fetch
# → 78,026 bytes in 2.34s

KyFromAbove 3-inch orthoimagery thumbnail — rural central Kentucky

3-inch orthoimagery, KyFromAbove Phase 3 (2024). Public domain. Full COG available at kyfromabove.s3.us-west-2.amazonaws.com.


Sentinel-2 STAC Search — --output json

earthforge stac search sentinel-2-l2a \
  --bbox -85,37,-84,38 --datetime 2025-06/2025-09 --max-items 5 \
  --output json
{
  "collection": "sentinel-2-l2a",
  "matched": 47,
  "returned": 5,
  "elapsed_seconds": 1.243,
  "items": [
    {
      "id": "S2A_18SYJ_20250914_0_L2A",
      "datetime": "2025-09-14T16:28:43Z",
      "properties": { "eo:cloud_cover": 4.2, "platform": "sentinel-2a" }
    }
  ]
}

Full sample: data/samples/stac_search.json


COG Metadata — earthforge raster info

earthforge raster info \
  https://sentinel-cogs.s3.us-west-2.amazonaws.com/.../B04.tif \
  --output json
{
  "format": "COG",
  "width": 10980, "height": 10980,
  "crs": "EPSG:32618",
  "is_tiled": true, "tile_width": 512, "tile_height": 512,
  "overview_count": 6,
  "compression": "deflate"
}

Full sample: data/samples/raster_info.json


GeoParquet Metadata — earthforge vector info

earthforge vector info ky_wildlife_management_areas.parquet --output json
{
  "format": "geoparquet",
  "row_count": 83,
  "geometry_types": ["MultiPolygon"],
  "crs": "EPSG:4326",
  "bbox": [-89.57, 36.49, -81.96, 39.15],
  "compression": "SNAPPY",
  "file_size_bytes": 142863
}

Full sample: data/samples/vector_info.json


Output Gallery

All images below were generated from real-world data using EarthForge example scripts. No synthetic or simulated data. Each output includes a .txt sidecar with alt text, data provenance, and generation metadata. See examples/outputs/ for full details.

Grand Canyon — DEM with Hillshade + Cross-Section

SRTM 30m elevation data via OpenTopography API. Shows 1,844m of relief from river to rim with elevation cross-section profile.

Elevation map of the Grand Canyon from SRTM 30m DEM with cividis palette and hillshade overlay. Top panel shows terrain from 728m at river level to 2572m at the rim. A high-contrast orange cross-section line with black outline is drawn at the midpoint. Bottom panel shows the east-west elevation profile revealing the canyon's V-shaped depth.

Swiss Alps — Matterhorn/Zermatt Elevation Analysis

Copernicus DEM 30m via OpenTopography. Elevations from 1,868m (valley) to 4,330m (peaks) with statistics sidebar.

Elevation map of the Swiss Alps near the Matterhorn and Zermatt from Copernicus DEM 30m. Viridis palette shows valleys at 1868m in dark purple and alpine peaks at 4330m in bright yellow. Hillshade overlay reveals glacial valleys and ridgelines. Statistics sidebar lists min, max, mean, median, and standard deviation.

Colorado Front Range — Sentinel-2 NDVI

Vegetation gradient from plains to alpine tundra, showing elevation-driven ecology. BrBG colorblind-safe diverging palette.

NDVI map of the Colorado Front Range from Boulder to Rocky Mountain National Park. Brown-white-teal BrBG diverging palette shows urban and bare areas in brown, transitional zones in white, and dense montane forest in teal. The elevation-driven vegetation gradient is clearly visible from east (plains) to west (alpine).

Netherlands — Urban/Water/Vegetation NDVI

Sentinel-2 scene over Rotterdam/Delft showing water (NDVI < 0), urban (low NDVI), and agricultural areas (high NDVI).

NDVI map of Rotterdam and Delft in the Netherlands showing three distinct land cover classes: water bodies with negative NDVI in brown, urban areas with low NDVI in light brown, and agricultural fields with high NDVI in teal. BrBG diverging palette. Water covers 7.8 percent, urban 18.9 percent, vegetation 63.3 percent of the scene.

Amazon Rainforest — Tropical NDVI

Sentinel-2 scene near Manaus, Brazil showing dense tropical forest canopy with uniformly high NDVI.

NDVI map of the Amazon rainforest near Manaus, Brazil from Sentinel-2 imagery. The dense tropical canopy shows uniformly high NDVI values (mean 0.42) in teal and dark teal. River channels and cleared areas appear in brown. BrBG colorblind-safe diverging palette.

Copernicus DEM — Elevation Statistics + Histogram

Raster statistics computed from a Copernicus DEM 30m tile with elevation distribution histogram. Viridis colorblind-safe palette.

Elevation histogram and summary statistics from a Copernicus DEM 30m tile. Left panel shows the distribution of elevation values from 94m to 377m with viridis-colored bars. Right panel lists summary statistics: min 94m, max 377m, mean 216m, median 221m, std 43m, computed from 12.96 million valid pixels.

Yellowstone — Landsat STAC Search Footprints

Landsat Collection 2 Level-2 scene footprints from Earth Search, color-coded by cloud cover percentage.

Map of 40 Landsat Collection 2 Level-2 scene footprints over Yellowstone National Park from a STAC search. Footprints are rectangles colored by cloud cover percentage using a reversed viridis palette where bright colors indicate low cloud cover and dark colors indicate high cloud cover.

Yosemite — Multi-Collection STAC Query

Two-panel figure querying both Sentinel-2 scenes and Copernicus DEM tiles from a single STAC API.

Two-panel figure showing a multi-collection STAC query over Yosemite National Park. Left panel displays Sentinel-2 scene footprints colored by cloud cover. Right panel shows a Copernicus DEM elevation map with hillshade, elevations from 543m to 3547m in viridis palette.

STAC-to-NDVI Pipeline

Complete pipeline workflow: STAC search, range-read Sentinel-2 bands, NDVI computation via safe expression evaluator, rendered output with pipeline summary.

NDVI map produced by an automated STAC-to-NDVI pipeline workflow. Left panel shows the computed NDVI using BrBG diverging palette with brown for bare areas and teal for vegetation. Right panel lists the 4-step pipeline: STAC search, range-read B04 and B08 bands, NDVI computation, and render output, with NDVI statistics.

Format Detection Matrix

EarthForge's three-stage format detection chain (magic bytes, extension, content inspection) tested across 12 geospatial file formats.

Table showing EarthForge format detection results for 12 geospatial file types. Each row shows the file type, extension, detected format, detection method, and pass or miss status. Seven of twelve formats are correctly detected via magic bytes. Results use ColorBrewer Set2 palette with teal for PASS and orange for MISS.


What EarthForge Is Not

EarthForge is not a platform. It does not include a web server, a tile cache, a database, an ML pipeline, or a Kubernetes deployment. It is not a replacement for QGIS, ArcGIS, or Google Earth Engine. It does not try to be everything — it is a focused set of tools that integrate with existing workflows via structured output, stdin/stdout piping, and Python imports.

If you need a tile server, use TiTiler. If you need a STAC API, use stac-fastapi. If you need a geospatial database, use PostGIS. EarthForge is the CLI toolkit you reach for alongside those tools, not instead of them.

Install

# Full toolkit
pip install earthforge[all]

# Just what you need
pip install earthforge[stac]        # STAC discovery only
pip install earthforge[raster]      # COG operations only
pip install earthforge[vector]      # GeoParquet operations only
pip install earthforge[cube]        # Zarr datacube operations only
pip install earthforge[cli]         # CLI framework only

Cloud Storage

EarthForge uses named profiles for cloud storage authentication, similar to AWS CLI profiles:

# Initialize config
earthforge config init

# Search with a specific profile
earthforge stac search sentinel-2-l2a --profile planetary

Profiles are defined in ~/.earthforge/config.toml:

[profiles.default]
stac_api = "https://earth-search.aws.element84.com/v1"
storage = "s3"

[profiles.planetary]
stac_api = "https://planetarycomputer.microsoft.com/api/stac/v1"
storage = "azure"

Architecture

EarthForge is built as a monorepo with independently installable workspace packages. The architecture is documented in detail — not as an afterthought, but as the foundation the implementation is built on.

  • ARCHITECTURE.md — System design, dependency graph, module interfaces
  • ai-dev/decisions/ — Architectural decision records with alternatives considered and tradeoffs acknowledged
  • ai-dev/spec.md — Requirements and acceptance criteria per milestone

Key architectural decisions:

Decision Record Summary
Monorepo structure DL-001 Single repo with Hatch workspace packages, not 15 separate repos
Async-first I/O DL-002 All network I/O is async via httpx; sync wrappers for convenience
obstore for storage DL-003 Rust-backed S3/GCS/Azure abstraction over fsspec
Rust extension boundary DL-005 Rust for format detection and range reads; Python for everything else
Engineering credibility DL-006 Nothing ships empty; decisions before code; scope boundaries enforced
promptfoo evaluation DL-007 Agent prompts and guardrails regression-tested in CI via promptfoo

Formats

Format Support Operations
COG (Cloud Optimized GeoTIFF) Full info, validate, convert, preview, band math, tile
GeoParquet Full info, validate, convert, query, clip, tile
Zarr Full info, validate, convert, slice, stats
FlatGeobuf Read/Write info, validate, convert
COPC (Cloud Optimized Point Cloud) Info info
STAC (SpatioTemporal Asset Catalog) Full search, info, validate, fetch, publish

Contributing

See CONTRIBUTING.md. EarthForge has specific engineering standards — please read the contribution guide before opening a PR.

Code of Conduct

SeeCODE_OF_CONDUCT

Security

See SECURITY

License

GNU General Public License v3.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

earthforge-1.0.0.tar.gz (17.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

earthforge-1.0.0-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file earthforge-1.0.0.tar.gz.

File metadata

  • Download URL: earthforge-1.0.0.tar.gz
  • Upload date:
  • Size: 17.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for earthforge-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a15901423017409928ce52b8a6f33a6ad9043ec4627ce431e125bb1dced86ffe
MD5 878ff94066e4abae3bee4acb6827e8f4
BLAKE2b-256 5d75e73060c0cd2ffe5323cbb89868aeb826eea3193294a72558cd92688345f9

See more details on using hashes here.

File details

Details for the file earthforge-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: earthforge-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for earthforge-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 51aa0ee31fdfd2d136f484a38897f1d335518d0c66dc30e1fd84d751357b1efd
MD5 b35dc65b17f69257f16a8164bc6f5427
BLAKE2b-256 48c628872367fa5bf8f495b6e33e5795870fefd3700588461d46e6e028dc7f2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page