Skip to main content

Lazily load COG assets from STAC items into xarray DataArrays using async-geotiff

Project description

lazycogs

CI PyPI Python Versions License

Open a lazy (band, time, y, x) xarray DataArray from thousands of cloud-optimized geotiffs (COGs). No GDAL required.

What is lazycogs?

stackstac and odc-stac established the pattern that lazycogs builds on: take a STAC item collection and expose it as a spatially-aligned xarray DataArray ready for dask-parallel computation. Both are excellent tools that cover most satellite imagery workflows well. They rely on the trusty combination of rasterio and GDAL for data i/o and warping operations.

lazycogs takes the same approach but replaces GDAL and rasterio with a Rust-native stack: rustac for STAC queries over stac-geoparquet files, async-geotiff for COG i/o, and obstore for cloud storage access.

The result is a tool that can instantly expose a lazy xarray DataArray view of massive STAC item archives in any CRS and resolution. Each array operation triggers a targeted spatial query on the stac-geoparquet file to find only the assets needed for that specific chunk — no upfront scan of every item required.

One constraint worth naming: lazycogs only reads Cloud Optimized GeoTIFFs. If your assets are in another format, this is not the right tool.

Task Library
STAC search + spatial indexing rustac (DuckDB + geoparquet)
COG I/O async-geotiff (Rust, no GDAL)
Cloud storage obstore
Reprojection pyproj + numpy
Lazy dataset construction xarray BackendEntrypoint + LazilyIndexedArray

Installation

pip install lazycogs

Coordinate convention

lazycogs.open() returns a DataArray whose y coordinates follow the standard north-up raster convention with the origin in the top left (not bottom left). That is, y coordinates are descending from north to south. In other words, y label 0 is the northernmost pixel and y[-1] is the southernmost. This matches the affine transform and is consistent with odc-stac, rioxarray, and GDAL.

Use sel(y=slice(north, south)) (high to low) for spatial subsetting.

x and y keep their RasterIndex-based spatial selection behavior, but the coordinate variables themselves are materialized eagerly so chunked nearest-neighbor spatial selections compute cleanly.

Example

import rustac
import lazycogs
from pyproj import Transformer

# set a target CRS and extent
dst_crs = "EPSG:32615"
dst_bbox = (380000.0, 4928000.0, 420000.0, 4984000.0)

# transform to 4326 for STAC search
transformer = Transformer.from_crs(dst_crs, "epsg:4326", always_xy=True)
bbox_4326 = transformer.transform_bounds(*dst_bbox)

# Search a STAC API and cache results to a local stac-geoparquet file.
await rustac.search_to(
    "items.parquet",
    "https://earth-search.aws.element84.com/v1",
    collections=["sentinel-2-l2a"],
    datetime="2023-06-01/2023-08-31",
    bbox=bbox_4326,
)

# Open a fully lazy (band, time, y, x) DataArray. No COGs are read yet.
da = lazycogs.open(
    "items.parquet",
    bbox=dst_bbox,
    crs=dst_crs,
    resolution=10.0,
)

Async loading

When you are already inside an async context (for example, a Jupyter notebook running on an asyncio loop), you can trigger chunk reads without blocking the event loop:

# Fetch data asynchronously and load into memory in-place.
subset = await da.isel(x=slice(0, 10), y=slice(0, 10), time=slice(0, 10)).load_async()

load_async uses xarray's async protocol, which dispatches through MultiBandStacBackendArray.async_getitem and stays on the caller's event loop. Multiple concurrent chunk reads overlap naturally, so the async path can be faster than the synchronous da.compute() when reading many chunks inside an already-running loop.

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazycogs-0.3.1.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazycogs-0.3.1-py3-none-any.whl (47.4 kB view details)

Uploaded Python 3

File details

Details for the file lazycogs-0.3.1.tar.gz.

File metadata

  • Download URL: lazycogs-0.3.1.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lazycogs-0.3.1.tar.gz
Algorithm Hash digest
SHA256 a708dc60b245681ac11ba20dc0ca8c5feaec62c32620d6ebc415e399fb0155b1
MD5 231e4a918abf7bae2abbd7024c6d8d88
BLAKE2b-256 738d22adfc567a558a0125b9458525436c6dcb10f3757598be316074536883f4

See more details on using hashes here.

File details

Details for the file lazycogs-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: lazycogs-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 47.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lazycogs-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed23b373e277ac1d84bb4245ee9b326267564b34989167032c2915136810b293
MD5 1dcfd4889c2b5fea446de16787658272
BLAKE2b-256 c7b20c875e7aac765d95917c3ffbfcdb6abfc821943ce70b2ffa08e0959f2c71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page