Support-aware machine learning for Earth observation
Project description
asterra
Support-aware machine learning for Earth observation.
Motivation
Earth observation (EO) machine learning workflows routinely mix data sources and label types with mismatched spatial supports:
- sensors with different pixel sizes (e.g., Sentinel-2 vs PlanetScope)
- labels defined on parcels/fields, tiles, scenes, or time windows (not per-pixel)
- coarse-to-fine (and fine-to-coarse) supervision
- patch overlap leakage and neighborhood dependence in evaluation
Ignoring these mismatches often leads to:
- biased features/labels due to incorrect aggregation
- silent leakage (overlapping patches, neighboring pixels, same-tile same-date)
- metrics that do not correspond to the true label support
What Asterra does
Asterra is a NumPy-first, scikit-learn-compatible package for building support-aware pipelines.
The core abstraction is a sparse, overlap-based SupportMatrix that maps one support to another. It powers:
- mixed-resolution aggregation and projection (grid ↔ grid, samples → groups)
- support-aware feature and label projection
- leakage-safe splitting utilities (buffers, tile/time grouping)
- support-aware metrics
Installation
# Once published to PyPI:
# python -m pip install asterra
# Install from source (GitHub):
python -m pip install "asterra @ git+https://github.com/ArnaBannonymus/asterra.git@main"
Optional geospatial extras (not required for pixel-space workflows):
python -m pip install "asterra[geo] @ git+https://github.com/ArnaBannonymus/asterra.git@main"
Quickstart
import numpy as np
from asterra.data import EOData
from asterra.io import sensors
from asterra.support import SupportMatrix
# Synthetic Sentinel-2-like grid (H, W, B)
arr_s2 = np.random.RandomState(0).randn(32, 32, 4).astype("float32")
e_s2 = EOData.from_array(
arr_s2,
band_schema=sensors.sentinel2_rgbn(),
support={"kind": "grid", "resolution": (10.0, 10.0), "origin": (0.0, 0.0)},
)
# Synthetic PlanetScope-like grid on a different resolution
arr_ps = np.random.RandomState(1).randn(64, 64, 4).astype("float32")
e_ps = EOData.from_array(
arr_ps,
band_schema=sensors.planetscope_4band(),
support={"kind": "grid", "resolution": (5.0, 5.0), "origin": (0.0, 0.0)},
)
# Map PlanetScope pixels (source) onto Sentinel-2 pixels (target)
M = SupportMatrix.from_grid_to_grid(source=e_ps.support, target=e_s2.support)
X_ps_on_s2 = M.project_features(e_ps.as_samples())
print(X_ps_on_s2.shape) # (32*32, 4)
Visual proofs (local datasets)
Asterra is NumPy-first and does not ship heavy geospatial file I/O. For GeoTIFF/SAFE/NetCDF products you
typically read data with tools like rasterio/xarray and then construct EOData with a BandSchema and
SupportSpec.
The figures below are generated from local datasets (not included in this repo) to sanity-check the support-aware operators on non-synthetic inputs:
Planet (3m PF-SR) NDVI → Sentinel-2 (10m) NDVI window (SupportMatrix overlap projection + sparse structure view)
Sentinel-1 VV/VH (dB) window + label map (leakage-aware spatial CV demo data)
complex SAR patch example (HH/HV magnitudes)
To regenerate these visuals on your machine (with your own file paths):
python scripts/generate_readme_visuals.py \
--planet-pf-sr /path/to/planet_pf_sr.tif \
--s2-lr-ndvi /path/to/s2_lr_ndvi.tif \
--s1-vv /path/to/s1_vv.tif \
--s1-vh /path/to/s1_vh.tif \
--label-map /path/to/label_map.tif \
--cvdl-city-dir /path/to/S1SLC_CVDL/City
Supported inputs
.npyarrays with shapes(H, W, B),(T, H, W, B),(N, B)- generic EO arrays with user-provided metadata:
band_names- georeferencing (
resolution/originor affinetransform/crs) when available - pixel-space coordinates when georeferencing is not available
- explicit group identifiers for parcel/tile/time supports
Built-in sensor presets
Sensor helpers are convenience presets; the core library is sensor-agnostic.
- Sentinel-2 (common optical bands)
- Sentinel-1 (VV/VH-style SAR schema)
- PlanetScope (4-band and 8-band styles)
- NISAR-style configurable SAR schemas
Architecture
The project is organized to keep EO-specific functionality separate from potentially generic sparse support logic:
asterra.data: EO data model (array + band schema + support metadata)asterra.support: sparse support operators (SupportMatrix, projection)asterra.preprocessing: reshape/masking and band-aware transformersasterra.model_selection: leakage-aware splitters/utilitiesasterra.metrics: support-aware metricsasterra.io:.npyloader + sensor presets
See DESIGN_BOUNDARIES.md and UPSTREAMING.md for boundary notes and candidate generic components.
Release status
0.1.x is an early, focused release line. The API is intentionally narrow and may evolve based on user feedback
and scientific validation.
Roadmap (high level)
- richer support specifications (polygons/parcels via optional geo extras)
- additional support-aware scorers and splitters
- integration examples with local EO stacks (while keeping the core sensor-agnostic)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asterra-0.1.3.tar.gz.
File metadata
- Download URL: asterra-0.1.3.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acde1ab094ed21df225b97236edfaf22612286794107e67464846a1b4bc0bb02
|
|
| MD5 |
ec38b7c9c93990011981665ec3730d18
|
|
| BLAKE2b-256 |
f8b5f0f4d8562fe5d7c7eef131e88d4438908cda18a2f4e192cba6d9242065fd
|
Provenance
The following attestation bundles were made for asterra-0.1.3.tar.gz:
Publisher:
release.yml on ArnaBannonymus/asterra
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asterra-0.1.3.tar.gz -
Subject digest:
acde1ab094ed21df225b97236edfaf22612286794107e67464846a1b4bc0bb02 - Sigstore transparency entry: 1329170626
- Sigstore integration time:
-
Permalink:
ArnaBannonymus/asterra@8188f6a71db0715c2341b281fc7620dcd3eba313 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ArnaBannonymus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8188f6a71db0715c2341b281fc7620dcd3eba313 -
Trigger Event:
push
-
Statement type:
File details
Details for the file asterra-0.1.3-py3-none-any.whl.
File metadata
- Download URL: asterra-0.1.3-py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8af91f612f084ff216e9a342b6233766b4838a2a69da4085c1926d80bc2a9d7
|
|
| MD5 |
67054f84473f8a23e7c447edc73b9938
|
|
| BLAKE2b-256 |
60009920a02b1fb7cdcb0ab52df13b505141f3cc6a703b32b446b94cb27235ee
|
Provenance
The following attestation bundles were made for asterra-0.1.3-py3-none-any.whl:
Publisher:
release.yml on ArnaBannonymus/asterra
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asterra-0.1.3-py3-none-any.whl -
Subject digest:
b8af91f612f084ff216e9a342b6233766b4838a2a69da4085c1926d80bc2a9d7 - Sigstore transparency entry: 1329170650
- Sigstore integration time:
-
Permalink:
ArnaBannonymus/asterra@8188f6a71db0715c2341b281fc7620dcd3eba313 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ArnaBannonymus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8188f6a71db0715c2341b281fc7620dcd3eba313 -
Trigger Event:
push
-
Statement type: