Efficient loader for Alpha Earth Foundations embeddings using virtual-tiff and obstore
Project description
aef-loader
Virtualizarr access for AEF embeddings as an analysis ready data cube, alongside rapid querying of the GCS and Source Coop index. 2x quicker than rioxarray for single tile downloads.
What is AEF?
Alpha Earth Foundations embeddings is a dataset produced by Google Deepmind, providing a yearly 64-channel embeddings derived from numerous satellite image sources with numerous downstream applications. The embeddings are stored as multi-band Cloud-Optimised GeoTIFFs (COGs), alongside a parquet index file.
AEF is stored by two hosts:
- Google Cloud Storage (official support) - requester pays (requires gcp_project)
- Source Cooperative - AWS hosted and free to access
More in the docs.
What does aef-loader do?
aef-loader provides two key functionalities:
- Rapid download, and querying of indexes for source_coop + gcs with obstore and geopandas
- Lazily load the COGs as VirtualiZarr as a datatree by UTM zone, COG headers are cached, so repeated reads are cheap(er)
As additional utilities:
- dequantize, and requantize the embeddings
- split the "embeddings" dataset into 64 datasets
- use odc-geobox for dask aware reprojections for creating multi-zone composites
Overview
Alpha Earth Foundations embeddings is a dataset produced by Google Deepmind, providing a yearly 64-channel embeddings derived from numerous satellite image sources with numerous downstream applications. The embeddings are stored as multi-band Cloud-Optimised GeoTIFFs (COGs).
aef-loader supports two dataset hosts, both having tradeoffs:
- Google Cloud Storage - maintained by the Earth Engine team, more up to date but requiring authentication and "requester pays", meaning users must pay egress and other charges.
- Source Cooperative - Hosted on AWS S3 and free to access, but generally less up to date (currently missing 2017 and 2025)
Installation
pip install aef-loader
or:
uv add aef-loader
Quick Start
import asyncio
from aef_loader import AEFIndex, VirtualTiffReader, DataSource
from aef_loader.utils import reproject_datatree
from odc.geo.geobox import GeoBox
async def main():
# Initialize index (Source Cooperative - no auth needed)
index = AEFIndex(source=DataSource.SOURCE_COOP)
await index.download()
index.load() # returns a gdf for alternative use
# Query for tiles
tiles = await index.query(
bbox=(-122.5, 37.5, -122.0, 38.0),
years=(2020, 2023),
)
# Load tiles organized by UTM zone
async with VirtualTiffReader() as reader:
tree = await reader.open_tiles_by_zone(tiles)
# Each zone is a separate Dataset with its native CRS
for zone in tree.children:
ds = tree[zone].ds
print(f"{zone}: {ds.odc.crs}, {dict(ds.sizes)}")
# Optionally reproject all zones to a common CRS
target = GeoBox.from_bbox(
bbox=(-122.5, 37.5, -122.0, 38.0),
crs="EPSG:4326",
resolution=0.0001,
)
combined = reproject_datatree(tree, target)
asyncio.run(main())
Attribution and Dataset License
This dataset is licensed under CC-BY 4.0 and requires the following attribution text: "The AlphaEarth Foundations Satellite Embedding dataset is produced by Google and Google DeepMind."
Special notes
Thanks to Max Jones, Virtual-tiff and Virtualizarr.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aef_loader-0.1.0.tar.gz.
File metadata
- Download URL: aef_loader-0.1.0.tar.gz
- Upload date:
- Size: 164.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54239b7485e8c6a8ee9b505d95a691f3d4b9a358d6e5ce6899c5d9e0fc15f601
|
|
| MD5 |
3d0552820dbdf669a50c6f908f560dd7
|
|
| BLAKE2b-256 |
fce787379c631939ab2a0e6cc078371165014b998180a10e641b2782ef546842
|
Provenance
The following attestation bundles were made for aef_loader-0.1.0.tar.gz:
Publisher:
publish.yml on jakenotjay/aef-loader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aef_loader-0.1.0.tar.gz -
Subject digest:
54239b7485e8c6a8ee9b505d95a691f3d4b9a358d6e5ce6899c5d9e0fc15f601 - Sigstore transparency entry: 1007175375
- Sigstore integration time:
-
Permalink:
jakenotjay/aef-loader@a0907470f41e99d07ab746ebd1c910bd70424faa -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jakenotjay
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a0907470f41e99d07ab746ebd1c910bd70424faa -
Trigger Event:
push
-
Statement type:
File details
Details for the file aef_loader-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aef_loader-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd739b494bf2638449ddefb82bbaa8f0f7a44f806948e492fb7b4cd1be3c3a48
|
|
| MD5 |
168ce3d0251346c9802735af1dec776d
|
|
| BLAKE2b-256 |
f9b3fe3cb34ce8e68d71e55a5d360ba8202a5637de234ce55f7c1fde3eca27a5
|
Provenance
The following attestation bundles were made for aef_loader-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on jakenotjay/aef-loader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aef_loader-0.1.0-py3-none-any.whl -
Subject digest:
fd739b494bf2638449ddefb82bbaa8f0f7a44f806948e492fb7b4cd1be3c3a48 - Sigstore transparency entry: 1007175377
- Sigstore integration time:
-
Permalink:
jakenotjay/aef-loader@a0907470f41e99d07ab746ebd1c910bd70424faa -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jakenotjay
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a0907470f41e99d07ab746ebd1c910bd70424faa -
Trigger Event:
push
-
Statement type: