Skip to main content

Multi-resolution aggregation for ICESat-2 ATL06 data using morton/healpix indexing

Project description

zagg - Multi-resolution Aggregation

Aggregate point observations to multi-resolution grids using HEALPix spatial indexing and serverless compute.

Overview

zagg aggregates sparse point data (e.g., ICESat-2 ATL06 elevation measurements) to gridded products using HEALPix/morton spatial indexing. Processing runs in parallel on AWS Lambda — each worker handles one spatial cell independently, writing to a shared Zarr v3 store following the DGGS convention.

Features

  • Pre-computed granule catalogs — query CMR once, process many times
  • Morton-based spatial indexing — HEALPix nested scheme for hierarchical grids
  • Massive parallelism — tested with up to 1,700 concurrent Lambda workers
  • Direct S3 access — h5coro reads HDF5 via byte-range requests, no downloads
  • Cost-effective$0.006/cell ($2 per full Antarctica run on ARM64)

End-to-End Workflow

Step 1: Build a Granule Catalog

Query NASA's CMR to build a mapping of spatial cells to granule S3 URLs.

# ICESat-2 convenience — cycle number computes dates automatically:
uv run python -m zagg.catalog --cycle 22 --parent-order 6

# General — explicit date range and spatial polygon:
uv run python -m zagg.catalog \
    --start-date 2024-01-06 --end-date 2024-04-07 \
    --short-name ATL06 \
    --polygon my_region.geojson \
    --parent-order 6

When --polygon is provided, the bounding box for the CMR query is computed automatically from the polygon's extent, and morton_coverage uses the polygon for cell discovery. When no polygon is given, Antarctic drainage basins are used as the default.

Output: catalog_ATL06_2024-01-06_2024-04-07_order6.json

See Catalog API for full options.

Step 2: Deploy the Lambda Function

Build and deploy the Lambda function and its dependency layer.

# Build the function package
bash deployment/aws/build_function.sh

# Build the dependency layer (ARM64)
bash deployment/aws/build_arm64_layer.sh

# Deploy
bash deployment/aws/deploy.sh

See Lambda Deployment and ARM64 Build Guide.

Step 3: Run Processing

Processing reads a pipeline config YAML (data source, aggregation, output store) and a granule catalog. Run locally or dispatch to Lambda.

# Local processing (write to local Zarr):
uv run python -m zagg --config atl06.yaml --catalog catalog.json --store ./output.zarr

# Local processing (write to S3):
uv run python -m zagg --config atl06.yaml --catalog catalog.json --store s3://bucket/output.zarr

# Lambda dispatch (requires deployed Lambda function):
uv run python deployment/aws/invoke_lambda.py \
    --config atl06.yaml --catalog catalog.json

# Test with a few cells:
uv run python -m zagg --config atl06.yaml --catalog catalog.json --max-cells 5

# Dry run:
uv run python -m zagg --config atl06.yaml --catalog catalog.json --dry-run

The store path and output grid parameters are defined in the YAML config (output.store, output.grid.child_order) and can be overridden via --store on the command line.

Step 4: Visualize Results

The output Zarr is a public DGGS dataset. The included notebook rasterizes HEALPix cells to a polar stereographic grid for fast rendering with imshow.

uv run jupyter notebook notebooks/rasterized_zarr.ipynb

Adjust GRID_SPACING in the notebook to control output resolution (default 2 km).

Project Structure

zagg/
├── src/zagg/              # Main package (cloud-agnostic)
│   ├── __main__.py        # Local processing runner (python -m zagg)
│   ├── config.py          # YAML pipeline configuration
│   ├── processing.py      # Core aggregation pipeline
│   ├── catalog.py         # CMR query + catalog building
│   ├── schema.py          # Output schema + Zarr template
│   ├── store.py           # Store factory (local or S3)
│   ├── auth.py            # NASA Earthdata authentication
│   └── configs/           # Built-in pipeline configs (atl06.yaml)
├── deployment/            # Cloud-specific deployment
│   └── aws/               # Lambda handler, orchestrator, build scripts
├── notebooks/             # Visualization
├── docs/                  # Documentation
└── tests/                 # Test suite

Documentation

Development

# Install
uv sync --all-groups

# Run tests
uv run pytest

# Lint
uv run ruff check src/

Requires Python >= 3.12, uv, AWS credentials (for Lambda), and a NASA Earthdata account (for data access).

Performance

Metric Value
Execution time 2–3 min average per cell
Memory 2 GB configured, 1–1.5 GB typical
Throughput Tested with up to 1,700 concurrent workers
Cost $0.006/cell ($2 per full Antarctica run on ARM64)

License

MIT — see LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zagg-0.1.0.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zagg-0.1.0-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file zagg-0.1.0.tar.gz.

File metadata

  • Download URL: zagg-0.1.0.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zagg-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b66250991678c497a7db150bf1e498dad7e6de27a7406fa268129daa48926745
MD5 ea3aff89e17f61c50c30793d490e2f09
BLAKE2b-256 cefa6508442f87040e9f1595e88ff60c88ceb68118595672deba990ddedf1759

See more details on using hashes here.

Provenance

The following attestation bundles were made for zagg-0.1.0.tar.gz:

Publisher: publish.yml on englacial/zagg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zagg-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: zagg-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zagg-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c5b31040c92057ed738c49c5fe75119657f87b0c73c7850fae4d1f5dd9d9dc6d
MD5 f2461fc2c91d70b1cbfec58d4a2531bd
BLAKE2b-256 3d7ffe99110dc122565a7654c1fff00fa1f4be8659d68d7753058b7e9a268917

See more details on using hashes here.

Provenance

The following attestation bundles were made for zagg-0.1.0-py3-none-any.whl:

Publisher: publish.yml on englacial/zagg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page