Multi-resolution aggregation for ICESat-2 ATL06 data using morton/healpix indexing
Project description
zagg - Multi-resolution Aggregation
Aggregate point observations to multi-resolution grids using HEALPix spatial indexing and serverless compute.
Overview
zagg aggregates sparse point data (e.g., ICESat-2 ATL06 elevation measurements) to gridded products using HEALPix/morton spatial indexing. Processing runs in parallel on AWS Lambda — each worker handles one spatial cell independently, writing to a shared Zarr v3 store following the DGGS convention.
Features
- Pre-computed granule catalogs — query CMR once, process many times
- Morton-based spatial indexing — HEALPix nested scheme for hierarchical grids
- Massive parallelism — tested with up to 1,700 concurrent Lambda workers
- Direct S3 access — h5coro reads HDF5 via byte-range requests, no downloads
- Cost-effective —
$0.006/cell ($2 per full Antarctica run on ARM64)
End-to-End Workflow
Step 1: Build a Granule Catalog
Query NASA's CMR to build a mapping of spatial cells to granule S3 URLs.
# ICESat-2 convenience — cycle number computes dates automatically:
uv run python -m zagg.catalog --cycle 22 --parent-order 6
# General — explicit date range and spatial polygon:
uv run python -m zagg.catalog \
--start-date 2024-01-06 --end-date 2024-04-07 \
--short-name ATL06 \
--polygon my_region.geojson \
--parent-order 6
When --polygon is provided, the bounding box for the CMR query is computed automatically from the polygon's extent, and morton_coverage uses the polygon for cell discovery. When no polygon is given, Antarctic drainage basins are used as the default.
Output: catalog_ATL06_2024-01-06_2024-04-07_order6.json
See Catalog API for full options.
Step 2: Deploy the Lambda Function
Build and deploy the Lambda function and its dependency layer.
# Build the function package
bash deployment/aws/build_function.sh
# Build the dependency layer (ARM64)
bash deployment/aws/build_arm64_layer.sh
# Deploy
bash deployment/aws/deploy.sh
See Lambda Deployment and ARM64 Build Guide.
Step 3: Run Processing
Processing reads a pipeline config YAML (data source, aggregation, output store) and a granule catalog. Run locally or dispatch to Lambda.
# Local processing (write to local Zarr):
uv run python -m zagg --config atl06.yaml --catalog catalog.json --store ./output.zarr
# Local processing (write to S3):
uv run python -m zagg --config atl06.yaml --catalog catalog.json --store s3://bucket/output.zarr
# Lambda dispatch (requires deployed Lambda function):
uv run python deployment/aws/invoke_lambda.py \
--config atl06.yaml --catalog catalog.json
# Test with a few cells:
uv run python -m zagg --config atl06.yaml --catalog catalog.json --max-cells 5
# Dry run:
uv run python -m zagg --config atl06.yaml --catalog catalog.json --dry-run
The store path and output grid parameters are defined in the YAML config (output.store, output.grid.child_order) and can be overridden via --store on the command line.
Step 4: Visualize Results
The output Zarr is a public DGGS dataset. The included notebook rasterizes HEALPix cells to a polar stereographic grid for fast rendering with imshow.
uv run jupyter notebook notebooks/rasterized_zarr.ipynb
Adjust GRID_SPACING in the notebook to control output resolution (default 2 km).
Project Structure
zagg/
├── src/zagg/ # Main package (cloud-agnostic)
│ ├── __main__.py # Local processing runner (python -m zagg)
│ ├── config.py # YAML pipeline configuration
│ ├── processing.py # Core aggregation pipeline
│ ├── catalog.py # CMR query + catalog building
│ ├── schema.py # Output schema + Zarr template
│ ├── store.py # Store factory (local or S3)
│ ├── auth.py # NASA Earthdata authentication
│ └── configs/ # Built-in pipeline configs (atl06.yaml)
├── deployment/ # Cloud-specific deployment
│ └── aws/ # Lambda handler, orchestrator, build scripts
├── notebooks/ # Visualization
├── docs/ # Documentation
└── tests/ # Test suite
Documentation
- Architecture — design philosophy, end-to-end flow diagram, key decisions
- Schema — aggregation dispatch, extending with new statistics
- API Reference — catalog, processing, schema, auth modules
- Lambda Deployment — AWS setup and production use
- ARM64 Build Guide — building Lambda layers for ARM64
Development
# Install
uv sync --all-groups
# Run tests
uv run pytest
# Lint
uv run ruff check src/
Requires Python >= 3.12, uv, AWS credentials (for Lambda), and a NASA Earthdata account (for data access).
Performance
| Metric | Value |
|---|---|
| Execution time | 2–3 min average per cell |
| Memory | 2 GB configured, 1–1.5 GB typical |
| Throughput | Tested with up to 1,700 concurrent workers |
| Cost |
License
MIT — see LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zagg-0.1.0.tar.gz.
File metadata
- Download URL: zagg-0.1.0.tar.gz
- Upload date:
- Size: 38.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b66250991678c497a7db150bf1e498dad7e6de27a7406fa268129daa48926745
|
|
| MD5 |
ea3aff89e17f61c50c30793d490e2f09
|
|
| BLAKE2b-256 |
cefa6508442f87040e9f1595e88ff60c88ceb68118595672deba990ddedf1759
|
Provenance
The following attestation bundles were made for zagg-0.1.0.tar.gz:
Publisher:
publish.yml on englacial/zagg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zagg-0.1.0.tar.gz -
Subject digest:
b66250991678c497a7db150bf1e498dad7e6de27a7406fa268129daa48926745 - Sigstore transparency entry: 1343716953
- Sigstore integration time:
-
Permalink:
englacial/zagg@3ee5b58ecac471eea0779755cb230fdace5c3147 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/englacial
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3ee5b58ecac471eea0779755cb230fdace5c3147 -
Trigger Event:
push
-
Statement type:
File details
Details for the file zagg-0.1.0-py3-none-any.whl.
File metadata
- Download URL: zagg-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5b31040c92057ed738c49c5fe75119657f87b0c73c7850fae4d1f5dd9d9dc6d
|
|
| MD5 |
f2461fc2c91d70b1cbfec58d4a2531bd
|
|
| BLAKE2b-256 |
3d7ffe99110dc122565a7654c1fff00fa1f4be8659d68d7753058b7e9a268917
|
Provenance
The following attestation bundles were made for zagg-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on englacial/zagg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zagg-0.1.0-py3-none-any.whl -
Subject digest:
c5b31040c92057ed738c49c5fe75119657f87b0c73c7850fae4d1f5dd9d9dc6d - Sigstore transparency entry: 1343716954
- Sigstore integration time:
-
Permalink:
englacial/zagg@3ee5b58ecac471eea0779755cb230fdace5c3147 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/englacial
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3ee5b58ecac471eea0779755cb230fdace5c3147 -
Trigger Event:
push
-
Statement type: