Skip to main content

Lightweight multiscale zarr

Project description

topozarr - lightweight multiscale zarr pyramids

Python library to create multiscale Zarr pyramids for usage with zarr-layer.

Attempts to follow the GeoZarr spec.

  • multiscales — pyramid structure and resolution levels
  • proj: — coordinate reference system (CRS)
  • spatial: — affine transform, bounding box, and dimension names

Warning: experimental

Usage

Installation

You can install the tutorial optional dependency group to run this example.

uv add 'topozarr[tutorial]'
# or
pip install 'topozarr[tutorial]'

Example

import xarray as xr
import xproj # for crs assignment
from topozarr.coarsen import create_pyramid

# Load the air_temperature Xarray tutorial dataset
ds = xr.tutorial.open_dataset('air_temperature', chunks="auto")
ds = ds.proj.assign_crs(spatial_ref="EPSG:4326")
print(ds)
pyramid = create_pyramid(
    ds,
    levels=2,
    x_dim="lon",
    y_dim="lat",
    method="mean",  # "mean" (default) | "max" | "min" | "sum"
)
print(pyramid.encoding)
print(pyramid.dt)

Visualization hints

Use layer_hints to embed colormap and color range hints for zarr-layer directly in the pyramid metadata:

from topozarr.metadata import ZarrLayerVarConfig

pyramid = create_pyramid(
    ds,
    levels=2,
    x_dim="lon",
    y_dim="lat",
    layer_hints={"air": ZarrLayerVarConfig(colormap="blues", clim=[230, 310])},
)

These are written into the root zarr-layer metadata key and are optional — omitting layer_hints has no effect on the pyramid structure or encoding.

Chunking

create_pyramid returns a Pyramid with two attributes: pyramid.dt (the DataTree) and pyramid.encoding (recommended chunk and shard sizes per variable per level). Always pass pyramid.encoding as the encoding argument when writing — this is what applies the chunking strategy to the output store.

# Inspect the recommended encoding before writing
print(pyramid.encoding)

The heuristics target ~500KB chunks for web visualization. You can tune shard size with chunks_per_shard (default: 4, giving 16 chunks per shard and ~8MB shards). Valid values are powers of 2: 1, 2, 4, 8, 16, 32. Larger shards reduce task graph overhead when using Dask but increase memory usage.

chunks_per_shard chunks/shard approx shard size
1 1 ~500KB
4 16 ~8MB (default)
8 64 ~32MB
16 256 ~128MB

Pass chunks_per_shard=None to disable sharding entirely.

Writing

Always pass pyramid.encoding to apply the recommended chunking:

# Write to Zarr
from obstore.store import from_url
from zarr.storage import ObjectStore

store = ObjectStore(from_url(url="<your_bucket_url>", region="<your_region>"))
pyramid.dt.to_zarr(store, mode="w", encoding=pyramid.encoding, zarr_format=3)
# Write to Icechunk
import icechunk

storage = icechunk.s3_storage(bucket="<your_bucket>", prefix="<your_prefix>", from_env=True)
repo = icechunk.Repository.create(storage)
session = repo.writable_session("main")
pyramid.dt.to_zarr(session.store, mode="w", encoding=pyramid.encoding, consolidated=False)

Contributing

Clone the repo and install with the test dependency group:

git clone https://github.com/carbonplan/topozarr
cd topozarr
uv sync --group test

Run tests:

uv run pytest -n auto

Run conformance tests against the GeoZarr spec (requires the conformance group):

uv sync --group conformance
uv run pytest -n auto -m conformance

Lint and format:

uv run pre-commit run --all-files

To regenerate the demo datasets in S3 (requires AWS credentials), install the demo extra and run the build script:

uv sync --extra demo
uv run python scripts/build_demo_data.py --help

License

[!IMPORTANT] This code is licensed under the MIT License - see the LICENSE file for details.

About Us

CarbonPlan is a nonprofit organization that uses data and science for climate action. We aim to improve the transparency and scientific integrity of climate solutions through open data and tools. Find out more at carbonplan.org or get in touch by opening an issue or sending us an email

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topozarr-0.0.6.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

topozarr-0.0.6-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file topozarr-0.0.6.tar.gz.

File metadata

  • Download URL: topozarr-0.0.6.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for topozarr-0.0.6.tar.gz
Algorithm Hash digest
SHA256 8fc6728c7796da9c71a68f6026a78743dcedd6296e400206264487b5abf37dfe
MD5 d0e30b898f61775ba7c30a2db951e6ea
BLAKE2b-256 51ca583de433bc37128e431aba336a09642a9f6834988cd2021ca81216b3d63a

See more details on using hashes here.

File details

Details for the file topozarr-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: topozarr-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for topozarr-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 76c58b784d1272ecaf0b73a649f186c62799e8176b7a6d5a3d5fb834d333ad15
MD5 0987c045e526622be5b399d1316b186e
BLAKE2b-256 40b5628461bf9f08944dd4f5f8204de4cb798e34c4affc6bd05c2c3d0e0f0ff0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page