Skip to main content

Support-aware machine learning for Earth observation

Project description

asterra

Support-aware machine learning for Earth observation.

Motivation

Earth observation (EO) machine learning workflows routinely mix data sources and label types with mismatched spatial supports:

  • sensors with different pixel sizes (e.g., Sentinel-2 vs PlanetScope)
  • labels defined on parcels/fields, tiles, scenes, or time windows (not per-pixel)
  • coarse-to-fine (and fine-to-coarse) supervision
  • patch overlap leakage and neighborhood dependence in evaluation

Ignoring these mismatches often leads to:

  • biased features/labels due to incorrect aggregation
  • silent leakage (overlapping patches, neighboring pixels, same-tile same-date)
  • metrics that do not correspond to the true label support

What Asterra does

Asterra is a NumPy-first, scikit-learn-compatible package for building support-aware pipelines.

The core abstraction is a sparse, overlap-based SupportMatrix that maps one support to another. It powers:

  • mixed-resolution aggregation and projection (grid ↔ grid, samples → groups)
  • support-aware feature and label projection
  • leakage-safe splitting utilities (buffers, tile/time grouping)
  • support-aware metrics

Installation

# Once published to PyPI:
# python -m pip install asterra

# Install from source (GitHub):
python -m pip install "asterra @ git+https://github.com/ArnaBannonymus/asterra.git@main"

Optional geospatial extras (not required for pixel-space workflows):

python -m pip install "asterra[geo] @ git+https://github.com/ArnaBannonymus/asterra.git@main"

Quickstart

import numpy as np
from asterra.data import EOData
from asterra.io import sensors
from asterra.support import SupportMatrix

# Synthetic Sentinel-2-like grid (H, W, B)
arr_s2 = np.random.RandomState(0).randn(32, 32, 4).astype("float32")
e_s2 = EOData.from_array(
    arr_s2,
    band_schema=sensors.sentinel2_rgbn(),
    support={"kind": "grid", "resolution": (10.0, 10.0), "origin": (0.0, 0.0)},
)

# Synthetic PlanetScope-like grid on a different resolution
arr_ps = np.random.RandomState(1).randn(64, 64, 4).astype("float32")
e_ps = EOData.from_array(
    arr_ps,
    band_schema=sensors.planetscope_4band(),
    support={"kind": "grid", "resolution": (5.0, 5.0), "origin": (0.0, 0.0)},
)

# Map PlanetScope pixels (source) onto Sentinel-2 pixels (target)
M = SupportMatrix.from_grid_to_grid(source=e_ps.support, target=e_s2.support)
X_ps_on_s2 = M.project_features(e_ps.as_samples())
print(X_ps_on_s2.shape)  # (32*32, 4)

Visual proofs (local datasets)

Asterra is NumPy-first and does not ship heavy geospatial file I/O. For GeoTIFF/SAFE/NetCDF products you typically read data with tools like rasterio/xarray and then construct EOData with a BandSchema and SupportSpec.

The figures below are generated from local datasets (not included in this repo) to sanity-check the support-aware operators on non-synthetic inputs:

Planet (3m PF-SR) NDVI → Sentinel-2 (10m) NDVI window (SupportMatrix overlap projection)

Planet→Sentinel-2 NDVI projection sanity check

Sentinel-1 VV/VH (dB) window + label map (leakage-aware spatial CV demo data)

Sentinel-1 VV/VH and label map window

complex SAR patch example (HH/HV magnitudes)

CVDL complex SAR patch example

To regenerate these visuals on your machine (with your own file paths):

python scripts/generate_readme_visuals.py \
  --planet-pf-sr /path/to/planet_pf_sr.tif \
  --s2-lr-ndvi /path/to/s2_lr_ndvi.tif \
  --s1-vv /path/to/s1_vv.tif \
  --s1-vh /path/to/s1_vh.tif \
  --label-map /path/to/label_map.tif \
  --cvdl-city-dir /path/to/S1SLC_CVDL/City

Supported inputs

  • .npy arrays with shapes (H, W, B), (T, H, W, B), (N, B)
  • generic EO arrays with user-provided metadata:
    • band_names
    • georeferencing (resolution/origin or affine transform/crs) when available
    • pixel-space coordinates when georeferencing is not available
    • explicit group identifiers for parcel/tile/time supports

Built-in sensor presets

Sensor helpers are convenience presets; the core library is sensor-agnostic.

  • Sentinel-2 (common optical bands)
  • Sentinel-1 (VV/VH-style SAR schema)
  • PlanetScope (4-band and 8-band styles)
  • NISAR-style configurable SAR schemas

Architecture

The project is organized to keep EO-specific functionality separate from potentially generic sparse support logic:

  • asterra.data: EO data model (array + band schema + support metadata)
  • asterra.support: sparse support operators (SupportMatrix, projection)
  • asterra.preprocessing: reshape/masking and band-aware transformers
  • asterra.model_selection: leakage-aware splitters/utilities
  • asterra.metrics: support-aware metrics
  • asterra.io: .npy loader + sensor presets

See DESIGN_BOUNDARIES.md and UPSTREAMING.md for boundary notes and candidate generic components.

Release status

0.1.x is an early, focused release line. The API is intentionally narrow and may evolve based on user feedback and scientific validation.

Roadmap (high level)

  • richer support specifications (polygons/parcels via optional geo extras)
  • additional support-aware scorers and splitters
  • integration examples with local EO stacks (while keeping the core sensor-agnostic)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asterra-0.1.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asterra-0.1.1-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file asterra-0.1.1.tar.gz.

File metadata

  • Download URL: asterra-0.1.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asterra-0.1.1.tar.gz
Algorithm Hash digest
SHA256 48ca2495a5bacbebea2441e432e30a20cf55adb32f021b8939a001c697ea9796
MD5 46483d508c2b2771500880cb0b65a529
BLAKE2b-256 3c20ec1f0c68a5b8c317d7ceeb4569cd3603bda1f5256daa9f2c0901aa0941c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for asterra-0.1.1.tar.gz:

Publisher: release.yml on ArnaBannonymus/asterra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asterra-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: asterra-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asterra-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 23915cde352c59e87b4137fe39c54f67082376ef644a85fe479a4b3b082ae105
MD5 3b4f9f9f7cfe661c2f7a2c1195e399a5
BLAKE2b-256 ec921485738f2f5d2b375fc8cf10838cae48be8aff780fe4bb8b45f780e90642

See more details on using hashes here.

Provenance

The following attestation bundles were made for asterra-0.1.1-py3-none-any.whl:

Publisher: release.yml on ArnaBannonymus/asterra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page