Skip to main content

Support-aware machine learning for Earth observation

Project description

asterra

Support-aware machine learning for Earth observation.

Motivation

Earth observation (EO) machine learning workflows routinely mix data sources and label types with mismatched spatial supports:

  • sensors with different pixel sizes (e.g., Sentinel-2 vs PlanetScope)
  • labels defined on parcels/fields, tiles, scenes, or time windows (not per-pixel)
  • coarse-to-fine (and fine-to-coarse) supervision
  • patch overlap leakage and neighborhood dependence in evaluation

Ignoring these mismatches often leads to:

  • biased features/labels due to incorrect aggregation
  • silent leakage (overlapping patches, neighboring pixels, same-tile same-date)
  • metrics that do not correspond to the true label support

What Asterra does

Asterra is a NumPy-first, scikit-learn-compatible package for building support-aware pipelines.

The core abstraction is a sparse, overlap-based SupportMatrix that maps one support to another. It powers:

  • mixed-resolution aggregation and projection (grid ↔ grid, samples → groups)
  • support-aware feature and label projection
  • leakage-safe splitting utilities (buffers, tile/time grouping)
  • support-aware metrics

Installation

# Once published to PyPI:
# python -m pip install asterra

# Install from source (GitHub):
python -m pip install "asterra @ git+https://github.com/ArnaBannonymus/asterra.git@main"

Optional geospatial extras (not required for pixel-space workflows):

python -m pip install "asterra[geo] @ git+https://github.com/ArnaBannonymus/asterra.git@main"

Quickstart

import numpy as np
from asterra.data import EOData
from asterra.io import sensors
from asterra.support import SupportMatrix

# Synthetic Sentinel-2-like grid (H, W, B)
arr_s2 = np.random.RandomState(0).randn(32, 32, 4).astype("float32")
e_s2 = EOData.from_array(
    arr_s2,
    band_schema=sensors.sentinel2_rgbn(),
    support={"kind": "grid", "resolution": (10.0, 10.0), "origin": (0.0, 0.0)},
)

# Synthetic PlanetScope-like grid on a different resolution
arr_ps = np.random.RandomState(1).randn(64, 64, 4).astype("float32")
e_ps = EOData.from_array(
    arr_ps,
    band_schema=sensors.planetscope_4band(),
    support={"kind": "grid", "resolution": (5.0, 5.0), "origin": (0.0, 0.0)},
)

# Map PlanetScope pixels (source) onto Sentinel-2 pixels (target)
M = SupportMatrix.from_grid_to_grid(source=e_ps.support, target=e_s2.support)
X_ps_on_s2 = M.project_features(e_ps.as_samples())
print(X_ps_on_s2.shape)  # (32*32, 4)

Visual proofs (local datasets)

Asterra is NumPy-first and does not ship heavy geospatial file I/O. For GeoTIFF/SAFE/NetCDF products you typically read data with tools like rasterio/xarray and then construct EOData with a BandSchema and SupportSpec.

The figures below are generated from local datasets (not included in this repo) to sanity-check the support-aware operators on non-synthetic inputs:

Planet (3m PF-SR) NDVI → Sentinel-2 (10m) NDVI window (SupportMatrix overlap projection + sparse structure view)

Planet→Sentinel-2 NDVI projection sanity check

Sentinel-1 VV/VH (dB) window + label map (leakage-aware spatial CV demo data)

Sentinel-1 VV/VH and label map window

complex SAR patch example (HH/HV magnitudes)

CVDL complex SAR patch example

To regenerate these visuals on your machine (with your own file paths):

python scripts/generate_readme_visuals.py \
  --planet-pf-sr /path/to/planet_pf_sr.tif \
  --s2-lr-ndvi /path/to/s2_lr_ndvi.tif \
  --s1-vv /path/to/s1_vv.tif \
  --s1-vh /path/to/s1_vh.tif \
  --label-map /path/to/label_map.tif \
  --cvdl-city-dir /path/to/S1SLC_CVDL/City

Supported inputs

  • .npy arrays with shapes (H, W, B), (T, H, W, B), (N, B)
  • generic EO arrays with user-provided metadata:
    • band_names
    • georeferencing (resolution/origin or affine transform/crs) when available
    • pixel-space coordinates when georeferencing is not available
    • explicit group identifiers for parcel/tile/time supports

Built-in sensor presets

Sensor helpers are convenience presets; the core library is sensor-agnostic.

  • Sentinel-2 (common optical bands)
  • Sentinel-1 (VV/VH-style SAR schema)
  • PlanetScope (4-band and 8-band styles)
  • NISAR-style configurable SAR schemas

Architecture

The project is organized to keep EO-specific functionality separate from potentially generic sparse support logic:

  • asterra.data: EO data model (array + band schema + support metadata)
  • asterra.support: sparse support operators (SupportMatrix, projection)
  • asterra.preprocessing: reshape/masking and band-aware transformers
  • asterra.model_selection: leakage-aware splitters/utilities
  • asterra.metrics: support-aware metrics
  • asterra.io: .npy loader + sensor presets

See DESIGN_BOUNDARIES.md and UPSTREAMING.md for boundary notes and candidate generic components.

Release status

0.1.x is an early, focused release line. The API is intentionally narrow and may evolve based on user feedback and scientific validation.

Roadmap (high level)

  • richer support specifications (polygons/parcels via optional geo extras)
  • additional support-aware scorers and splitters
  • integration examples with local EO stacks (while keeping the core sensor-agnostic)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asterra-0.1.3.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asterra-0.1.3-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file asterra-0.1.3.tar.gz.

File metadata

  • Download URL: asterra-0.1.3.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asterra-0.1.3.tar.gz
Algorithm Hash digest
SHA256 acde1ab094ed21df225b97236edfaf22612286794107e67464846a1b4bc0bb02
MD5 ec38b7c9c93990011981665ec3730d18
BLAKE2b-256 f8b5f0f4d8562fe5d7c7eef131e88d4438908cda18a2f4e192cba6d9242065fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for asterra-0.1.3.tar.gz:

Publisher: release.yml on ArnaBannonymus/asterra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asterra-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: asterra-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asterra-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b8af91f612f084ff216e9a342b6233766b4838a2a69da4085c1926d80bc2a9d7
MD5 67054f84473f8a23e7c447edc73b9938
BLAKE2b-256 60009920a02b1fb7cdcb0ab52df13b505141f3cc6a703b32b446b94cb27235ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for asterra-0.1.3-py3-none-any.whl:

Publisher: release.yml on ArnaBannonymus/asterra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page