Skip to main content

Create virtual Zarr stores from archival data using xarray API

Project description

VirtualiZarr

CI Code coverage Docs Linted and Formatted with Ruff Checked with mypy pre-commit Enabled Apache 2.0 License Python Versions slack Latest Release PyPI - Downloads Conda - Downloads

Cloud-Optimize your Scientific Data as a Virtual Zarr Datacube, using Xarray syntax.

The best way to distribute large scientific datasets is via the Cloud, in Cloud-Optimized formats [^1]. But often this data is stuck in archival pre-Cloud file formats such as netCDF.

VirtualiZarr[^2] makes it easy to create "Virtual" Zarr datacubes, allowing performant access to archival data as if it were in the Cloud-Optimized Zarr format, without duplicating any data.

Please see the documentation.

Features

Inspired by Kerchunk

VirtualiZarr grew out of discussions on the Kerchunk repository, and is an attempt to provide the game-changing power of kerchunk but in a zarr-native way, and with a familiar array-like API.

You now have a choice between using VirtualiZarr and Kerchunk: VirtualiZarr provides almost all the same features as Kerchunk.

Development Status and Roadmap

VirtualiZarr version 1 (mostly) achieves feature parity with kerchunk's logic for combining datasets, providing an easier way to manipulate kerchunk references in memory and generate kerchunk reference files on disk.

Future VirtualiZarr development will focus on generalizing and upstreaming useful concepts into the Zarr specification, the Zarr-Python library, Xarray, and possibly some new packages.

We have a lot of ideas, including:

If you see other opportunities then we would love to hear your ideas!

Talks and Presentations

  • 2024/11/21 - MET Office Architecture Guild - Tom Nicholas - Slides
  • 2024/11/13 - Cloud-Native Geospatial conference - Raphael Hagen - Slides
  • 2024/07/24 - ESIP Meeting - Sean Harkins - Event / Recording
  • 2024/05/15 - Pangeo showcase - Tom Nicholas - Event / Recording / Slides

Credits

This package was originally developed by Tom Nicholas whilst working at [C]Worthy, who deserve credit for allowing him to prioritise a generalizable open-source solution to the dataset virtualization problem. VirtualiZarr is now a community-owned multi-stakeholder project.

Licence

Apache 2.0

References

[^1]: Cloud-Native Repositories for Big Scientific Data, Abernathey et. al., Computing in Science & Engineering.

[^2]: (Pronounced "Virtual-Eye-Zarr" - like "virtualizer" but more piratey 🦜)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtualizarr-1.3.2.tar.gz (146.7 kB view details)

Uploaded Source

Built Distribution

virtualizarr-1.3.2-py3-none-any.whl (137.9 kB view details)

Uploaded Python 3

File details

Details for the file virtualizarr-1.3.2.tar.gz.

File metadata

  • Download URL: virtualizarr-1.3.2.tar.gz
  • Upload date:
  • Size: 146.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.9

File hashes

Hashes for virtualizarr-1.3.2.tar.gz
Algorithm Hash digest
SHA256 ac3d15cdec201cb471f3c99ec9d5c9d3c97d0f48c0815cd787126b84917b4b63
MD5 58096367132bf2887f400f40e221759e
BLAKE2b-256 5198a71366dd1c4883b0c39895a0803b442750ebbbd6a5611ef0d30a9175a0ce

See more details on using hashes here.

File details

Details for the file virtualizarr-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: virtualizarr-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 137.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.9

File hashes

Hashes for virtualizarr-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b9881be5417b3552d350651a94ece7333d427f01a72b09eec7e918ec975529bc
MD5 8e035db08d6ff3a3fb3300c7d7ea2d4e
BLAKE2b-256 19af8d61c35ca50232e845be5161eefadb335e9aa2a8d3b42964d409aed66db5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page