Skip to main content

A high-performance library for intelligent loading and caching of remote geospatial raster data, built with xarray and zarr.

Project description

Smart Geocubes

A high-performance library for intelligent loading and caching of remote geospatial raster data, built with xarray, zarr and icechunk.

The concept of this package is heavily inspired by EarthMovers implementation of serverless datacube generation.

Quickstart

Install the package with uv or pip:

pip install smart-geocubes
uv add smart-geocubes

Open data for your region of interest:

import smart_geocubes
from odc.geo.geobox import GeoBox

accessor = smart_geocubes.ArcticDEM32m("datacubes/arcticdem_32m.icechunk")

roi = GeoBox.from_bbox((150, 65, 151, 65.5), shape=(1000, 1000), crs="EPSG:4326")

arcticdem_at_roi = accessor.load(roi, create=True)

Out of the box included datasets

Dataset Quickuse Source Link
ArcticDEM Mosaic 2m smart_geocubes.ArcticDEM2m STAC PGC
ArcticDEM Mosaic 10m smart_geocubes.ArcticDEM10m STAC PGC
ArcticDEM Mosaic 32m smart_geocubes.ArcticDEM32m STAC PGC
Tasseled Cap Tren smart_geocubes.TCTrend Google Earth Engine AWI

Implemented Remote Accessors

Accessor Description
smart_geocubes.accessors.STAC Accessor for the STAC API, which allows to download data from a STAC API.
smart_geocubes.accessors.GEE Accessor for Google Earth Engine, which allows to download data from Google Earth Engine.

What is the purpose of this package?

This package solves a specific problem that most people who work with Earth observation data don't need to worry about. When you're creating new data from existing data (for example, doing image segmentation with machine learning on Sentinel-2 images), people usually:

  1. Download all the data
  2. Run the algorithms and data science on it
  3. Delete the data afterwards

This "batched-processing" works great if you have a big computer with lots of storage space, like a cluster.

But if you're working on a smaller computer (like a laptop with a few hundred GB of storage and 16GB of RAM), this approach creates problems. It makes it really hard to test and improve your programs because you don't have enough space. Using frameworks like Ray for processing is also tricky with this approach. They work better with "concurrent-processing": when each step of your processing pipeline can be done for each elements separately instead expecting to run a single step for all your data at once. Plus, if you only need to look at certain areas but don't know which ones ahead of time, downloading everything is wasteful.

So instead, this package downloads the data only when you need it. But downloading the same thing over and over is inefficient. That's why we save (or "cache") the data on your computer's hard drive in form of zarr datacubes. We call this way of working "procedural download" because you download pieces as you need them.

Therefore, this package does handle:

  1. The download "on-demand" (or "procedural download") of the data
  2. The caching of the data on your computer's hard drive
  3. The loading of the data into memory for regions specified by the user
  4. Making everything thread-safe, so you can run on any scaling framework you like.

Danger! On linux systems it is necessary to the the multiprocessing start method to spawn or forkserver. Read more about this here, here and here.

The approach itself is already implemented in one of the pipelines we develop at the AWI, you can read more about their docs.

This won't help if your computer doesn't have fast storage space available - like if you're working on a cloud-cluster that can't save files locally.

Contribute

Please read the contribution guidelines for more information on how to contribute to this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_geocubes-0.1.4.tar.gz (16.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_geocubes-0.1.4-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file smart_geocubes-0.1.4.tar.gz.

File metadata

  • Download URL: smart_geocubes-0.1.4.tar.gz
  • Upload date:
  • Size: 16.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.7

File hashes

Hashes for smart_geocubes-0.1.4.tar.gz
Algorithm Hash digest
SHA256 3ec1ae256f53bac3eb6ff3c78f17976b86deb422edf408f2845ee808b8ef3993
MD5 925fae216b0d30e3efd476630057c617
BLAKE2b-256 6f2244eef2d61c417617020e4491cc72bda9583e2c64b63f9858cdae08de7285

See more details on using hashes here.

File details

Details for the file smart_geocubes-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for smart_geocubes-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 11028794870318b5f61c345198bcb540c7915e0159c2cb94a5ced6ec957a2d56
MD5 539e0d8243997ac2d79759c2056274a3
BLAKE2b-256 920798ed754c7e32b9beab598329421472b3d60c8c74baa41cc49059bb6032fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page