Skip to main content

Support-aware machine learning for Earth observation

Project description

asterra

Support-aware machine learning for Earth observation.

Motivation

Earth observation (EO) machine learning workflows routinely mix data sources and label types with mismatched spatial supports:

  • sensors with different pixel sizes (e.g., Sentinel-2 vs PlanetScope)
  • labels defined on parcels/fields, tiles, scenes, or time windows (not per-pixel)
  • coarse-to-fine (and fine-to-coarse) supervision
  • patch overlap leakage and neighborhood dependence in evaluation

Ignoring these mismatches often leads to:

  • biased features/labels due to incorrect aggregation
  • silent leakage (overlapping patches, neighboring pixels, same-tile same-date)
  • metrics that do not correspond to the true label support

What Asterra does

Asterra is a NumPy-first, scikit-learn-compatible package for building support-aware pipelines.

The core abstraction is a sparse, overlap-based SupportMatrix that maps one support to another. It powers:

  • mixed-resolution aggregation and projection (grid ↔ grid, samples → groups)
  • support-aware feature and label projection
  • leakage-safe splitting utilities (buffers, tile/time grouping)
  • support-aware metrics

Installation

# Once published to PyPI:
# python -m pip install asterra

# Install from source (GitHub):
python -m pip install "asterra @ git+https://github.com/ArnaBannonymus/asterra.git@main"

Optional geospatial extras (not required for pixel-space workflows):

python -m pip install "asterra[geo] @ git+https://github.com/ArnaBannonymus/asterra.git@main"

Quickstart

import numpy as np
from asterra.data import EOData
from asterra.io import sensors
from asterra.support import SupportMatrix

# Synthetic Sentinel-2-like grid (H, W, B)
arr_s2 = np.random.RandomState(0).randn(32, 32, 4).astype("float32")
e_s2 = EOData.from_array(
    arr_s2,
    band_schema=sensors.sentinel2_rgbn(),
    support={"kind": "grid", "resolution": (10.0, 10.0), "origin": (0.0, 0.0)},
)

# Synthetic PlanetScope-like grid on a different resolution
arr_ps = np.random.RandomState(1).randn(64, 64, 4).astype("float32")
e_ps = EOData.from_array(
    arr_ps,
    band_schema=sensors.planetscope_4band(),
    support={"kind": "grid", "resolution": (5.0, 5.0), "origin": (0.0, 0.0)},
)

# Map PlanetScope pixels (source) onto Sentinel-2 pixels (target)
M = SupportMatrix.from_grid_to_grid(source=e_ps.support, target=e_s2.support)
X_ps_on_s2 = M.project_features(e_ps.as_samples())
print(X_ps_on_s2.shape)  # (32*32, 4)

Visual proofs (local datasets)

Asterra is NumPy-first and does not ship heavy geospatial file I/O. For GeoTIFF/SAFE/NetCDF products you typically read data with tools like rasterio/xarray and then construct EOData with a BandSchema and SupportSpec.

The figures below are generated from local datasets (not included in this repo) to sanity-check the support-aware operators on non-synthetic inputs:

Planet (3m PF-SR) NDVI → Sentinel-2 (10m) NDVI window (SupportMatrix overlap projection + sparse structure view)

Planet→Sentinel-2 NDVI projection sanity check

Sentinel-1 VV/VH (dB) window + label map (leakage-aware spatial CV demo data)

Sentinel-1 VV/VH and label map window

complex SAR patch example (HH/HV magnitudes)

CVDL complex SAR patch example

To regenerate these visuals on your machine (with your own file paths):

python scripts/generate_readme_visuals.py \
  --planet-pf-sr /path/to/planet_pf_sr.tif \
  --s2-lr-ndvi /path/to/s2_lr_ndvi.tif \
  --s1-vv /path/to/s1_vv.tif \
  --s1-vh /path/to/s1_vh.tif \
  --label-map /path/to/label_map.tif \
  --cvdl-city-dir /path/to/S1SLC_CVDL/City

Supported inputs

  • .npy arrays with shapes (H, W, B), (T, H, W, B), (N, B)
  • generic EO arrays with user-provided metadata:
    • band_names
    • georeferencing (resolution/origin or affine transform/crs) when available
    • pixel-space coordinates when georeferencing is not available
    • explicit group identifiers for parcel/tile/time supports

Built-in sensor presets

Sensor helpers are convenience presets; the core library is sensor-agnostic.

  • Sentinel-2 (common optical bands)
  • Sentinel-1 (VV/VH-style SAR schema)
  • PlanetScope (4-band and 8-band styles)
  • NISAR-style configurable SAR schemas

Architecture

The project is organized to keep EO-specific functionality separate from potentially generic sparse support logic:

  • asterra.data: EO data model (array + band schema + support metadata)
  • asterra.support: sparse support operators (SupportMatrix, projection)
  • asterra.preprocessing: reshape/masking and band-aware transformers
  • asterra.model_selection: leakage-aware splitters/utilities
  • asterra.metrics: support-aware metrics
  • asterra.io: .npy loader + sensor presets

See DESIGN_BOUNDARIES.md and UPSTREAMING.md for boundary notes and candidate generic components.

Release status

0.1.x is an early, focused release line. The API is intentionally narrow and may evolve based on user feedback and scientific validation.

Roadmap (high level)

  • richer support specifications (polygons/parcels via optional geo extras)
  • additional support-aware scorers and splitters
  • integration examples with local EO stacks (while keeping the core sensor-agnostic)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asterra-0.1.2.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asterra-0.1.2-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file asterra-0.1.2.tar.gz.

File metadata

  • Download URL: asterra-0.1.2.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asterra-0.1.2.tar.gz
Algorithm Hash digest
SHA256 74541d73f06c453d66899c24924dc94f0c439da20930015bb8ad9a00b80ce5c2
MD5 2eeffc1f3064e94c0e6763e31e1d8a90
BLAKE2b-256 651c0d677e9c63917d464d56c41bf8cfe8ab61c3b44f4cd2973f8367dce2b5bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for asterra-0.1.2.tar.gz:

Publisher: release.yml on ArnaBannonymus/asterra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asterra-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: asterra-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asterra-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2ceb85621ad38e73953712d40d72b41ca4c782d15d37905100388fcb4f6bdd5f
MD5 dcae641d25eb6dbc0f3d10150df086f7
BLAKE2b-256 9a190114a04aad7fd7cdd0848a52b94170ee720347acdc5f2d5432b29cc7b5b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for asterra-0.1.2-py3-none-any.whl:

Publisher: release.yml on ArnaBannonymus/asterra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page