Skip to main content

A CodecPipeline for zarr-python backed by the zarrs Rust crate

Project description

zarrs-python

PyPI Downloads Downloads Stars CI CD

This project serves as a bridge between zarrs (Rust) and zarr (zarr-python) via PyO3. The main goal of the project is to speed up i/o (see zarr_benchmarks).

To use the project, simply install our package (which depends on zarr-python>=3.0.0), and run:

import zarr
zarr.config.set({"codec_pipeline.path": "zarrs.ZarrsCodecPipeline"})

You can then use your zarr as normal (with some caveats)!

API

We export a ZarrsCodecPipeline class so that zarr-python can use the class but it is not meant to be instantiated and we do not guarantee the stability of its API beyond what is required so that zarr-python can use it. Therefore, it is not documented here.

At the moment, we only support a subset of the zarr-python stores:

A NotImplementedError will be raised if a store is not supported.

Configuration

ZarrsCodecPipeline options are exposed through zarr.config.

Standard zarr.config options control some functionality (see the defaults in the config.py of zarr-python):

  • threading.max_workers: the maximum number of threads used internally by the ZarrsCodecPipeline on the Rust side.
  • array.write_empty_chunks: whether or not to store empty chunks.
    • Defaults to false if None. Note that checking for emptiness has some overhead, see here for more info.

The ZarrsCodecPipeline specific options are:

  • codec_pipeline.chunk_concurrent_maximum: the maximum number of chunks stored/retrieved concurrently.
    • Defaults to the number of logical CPUs if None. It is constrained by threading.max_workers as well.
  • codec_pipeline.chunk_concurrent_minimum: the minimum number of chunks retrieved/stored concurrently when balancing chunk/codec concurrency.
    • Defaults to 4 if None. See here for more info.
  • codec_pipeline.validate_checksums: enable checksum validation (e.g. with the CRC32C codec).
    • Defaults to True. See here for more info.
  • codec_pipeline.direct_io: enable O_DIRECT read/write, needs support from the operating system (currently only Linux) and file system.
    • Defaults to False.
  • codec_pipeline.strict: raise exceptions for unsupported operations instead of falling back to the default codec pipeline of zarr-python.
    • Defaults to False.

For example:

zarr.config.set({
    "threading.max_workers": None,
    "array.write_empty_chunks": False,
    "codec_pipeline": {
        "path": "zarrs.ZarrsCodecPipeline",
        "validate_checksums": True,
        "chunk_concurrent_maximum": None,
        "chunk_concurrent_minimum": 4,
        "direct_io": False,
        "strict": False
    }
})

If the ZarrsCodecPipeline is pickled, and then un-pickled, and during that time one of chunk_concurrent_minimum, chunk_concurrent_maximum, or num_threads has changed, the newly un-pickled version will pick up the new value. However, once a ZarrsCodecPipeline object has been instantiated, these values are then fixed. This may change in the future as guidance from the zarr community becomes clear.

Concurrency

Concurrency can be classified into two types:

  • chunk (outer) concurrency: the number of chunks retrieved/stored concurrently.
    • This is chosen automatically based on various factors, such as the chunk size and codecs.
    • It is constrained between codec_pipeline.chunk_concurrent_minimum and codec_pipeline.chunk_concurrent_maximum for operations involving multiple chunks.
  • codec (inner) concurrency: the number of threads encoding/decoding a chunk.
    • This is chosen automatically in combination with the chunk concurrency.

The product of the chunk and codec concurrency will approximately match threading.max_workers.

Chunk concurrency is typically favored because:

  • parallel encoding/decoding can have a high overhead with some codecs, especially with small chunks, and
  • it is advantageous to retrieve/store multiple chunks concurrently, especially with high latency stores.

zarrs-python will often favor codec concurrency with sharded arrays, as they are well suited to codec concurrency.

Supported Indexing Methods

The following methods will trigger use with the old zarr-python pipeline:

  1. Any oindex or vindex integer np.ndarray indexing with dimensionality >=3 i.e.,

    arr[np.array([...]), :, np.array([...])]
    arr[np.array([...]), np.array([...]), np.array([...])]
    arr[np.array([...]), np.array([...]), np.array([...])] = ...
    arr.oindex[np.array([...]), np.array([...]), np.array([...])] = ...
    
  2. Any vindex or oindex discontinuous integer np.ndarray indexing for writes in 2D

    arr[np.array([0, 5]), :] = ...
    arr.oindex[np.array([0, 5]), :] = ...
    
  3. vindex writes in 2D where both indexers are integer np.ndarray indices i.e.,

    arr[np.array([...]), np.array([...])] = ...
    
  4. Ellipsis indexing. We have tested some, but others fail even with zarr-python's default codec pipeline. Thus for now we advise proceeding with caution here.

    arr[0:10, ..., 0:5]
    

Furthermore, using anything except contiguous (i.e., slices or consecutive integer) np.ndarray for numeric data will fall back to the default zarr-python implementation.

Please file an issue if you believe we have more holes in our coverage than we are aware of or you wish to contribute! For example, we have an issue in zarrs for integer-array indexing that would unblock a lot the use of the rust pipeline for that use-case (very useful for mini-batch training perhaps!).

Further, any codecs not supported by zarrs will also automatically fall back to the python implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zarrs-0.2.2.tar.gz (63.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

zarrs-0.2.2-cp311-abi3-win_amd64.whl (5.8 MB view details)

Uploaded CPython 3.11+Windows x86-64

zarrs-0.2.2-cp311-abi3-musllinux_1_2_x86_64.whl (13.0 MB view details)

Uploaded CPython 3.11+musllinux: musl 1.2+ x86-64

zarrs-0.2.2-cp311-abi3-musllinux_1_2_armv7l.whl (12.1 MB view details)

Uploaded CPython 3.11+musllinux: musl 1.2+ ARMv7l

zarrs-0.2.2-cp311-abi3-musllinux_1_2_aarch64.whl (12.4 MB view details)

Uploaded CPython 3.11+musllinux: musl 1.2+ ARM64

zarrs-0.2.2-cp311-abi3-manylinux_2_28_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ x86-64

zarrs-0.2.2-cp311-abi3-manylinux_2_28_ppc64le.whl (6.7 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ppc64le

zarrs-0.2.2-cp311-abi3-manylinux_2_28_armv7l.whl (5.9 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARMv7l

zarrs-0.2.2-cp311-abi3-manylinux_2_28_aarch64.whl (6.1 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARM64

zarrs-0.2.2-cp311-abi3-macosx_11_0_arm64.whl (5.7 MB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

zarrs-0.2.2-cp311-abi3-macosx_10_12_x86_64.whl (6.2 MB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file zarrs-0.2.2.tar.gz.

File metadata

  • Download URL: zarrs-0.2.2.tar.gz
  • Upload date:
  • Size: 63.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zarrs-0.2.2.tar.gz
Algorithm Hash digest
SHA256 772787d1a410c36639767dbf2f1479fbb8c8180b14af80b97048cba13f4571ab
MD5 c3482ef48fc3c21b9ecd2be9e0ca2ad1
BLAKE2b-256 557425da74199453d3be671cbd0b19479a4efd1ea4aaf92185fb626a7cfe3ec3

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2.tar.gz:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: zarrs-0.2.2-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 5.8 MB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b6dedb5eb5267fb23e53121f9c3272bb4f83dabe989a778bfc226d89c60e9c70
MD5 fbe7b18deb78412a462aa046e4080819
BLAKE2b-256 e15827f5fff62c366a3de96a3808d176fb76a918bfa40666c0aa5f9cd029ab91

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-win_amd64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 aa373c66434da2b27b97455ded5d9a9cee707c8248148e407f93ad9ea016d9fb
MD5 6f38467882e9cca472a678460373e784
BLAKE2b-256 7092ffba78c8e67befdd060a2b4e5647c468ed824bd63181b0c4533300785aca

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-musllinux_1_2_x86_64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 98dd24aea3d815a76735068cc0c2481223ab481a67c449471d560852cac6b7b6
MD5 95b36ce95591d74a81e5b9498b5774ad
BLAKE2b-256 d8b0dbcd3c9c40bc187070197c201f1b7913a8a58825ff87b07284e5d6225e75

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-musllinux_1_2_armv7l.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 5a4bd54490749696735164ece2e942b718f291a8ace103e0d7e784561a169977
MD5 8d499777b9e43d5d987e1609bd0d55b4
BLAKE2b-256 75de8d574f132cfa1d64799fe9034413f37df6bcb1d2137936e3e3d198763118

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-musllinux_1_2_aarch64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c29fd1fa0a91a0cb41326975e11a4a0ffabf51533592396100a6b1ba2f784349
MD5 a229d215017a25a8be24208ceec60e24
BLAKE2b-256 71323c257e7e4f83c5ccc69af2c1656b5bf91f198bbba00b7a6a8ac4ee7e09b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-manylinux_2_28_x86_64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 9bed3eafd68750bed890da117006a6feed240a4b40c328e92b1746c253d9c582
MD5 9d04d04893e722c849df2d6a35cf17ad
BLAKE2b-256 a1f410fa874ab2b76ef512fbbd810fa56a8eccfffd8df390b4df841b62cd48a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-manylinux_2_28_ppc64le.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-manylinux_2_28_armv7l.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-manylinux_2_28_armv7l.whl
Algorithm Hash digest
SHA256 2c79354a73d4524c4deb9072a1f0ebf077ca1243472bbce0c0c8a0cd1421a5c0
MD5 2796d4b656e91bc17b2456776218c6a0
BLAKE2b-256 ff46e22c6fcb3445a875528866760df7c13801eacae2a12530bd43324d1a1f29

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-manylinux_2_28_armv7l.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e00f25dea65e8e9be8f923249e05722db6d96256354ce69de2f4c6f6987219b7
MD5 4373315497a6bbfb3ce59b69ff2377cb
BLAKE2b-256 39456e6ee11c9ae9d91226c1bcd3016f915dc9165459318e90ff335d8d6348ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-manylinux_2_28_aarch64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 96e88c01dc60f415cbd1cc284ef4636d959184a7d82d0ba4264aa0ca61d7c0b8
MD5 894d881a7571e4c1fc96666676616bb5
BLAKE2b-256 eb2c16d29f5d8ae5538854b4f8d6e0b929b197afea934c89a0ea141ec8a4ef07

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.2-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.2-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 599b124b143c85b48c98eceb7227cd52bfa4b5d36792428e647b588f41a456dd
MD5 49beb110e3ddf2107fb65f78add5c27a
BLAKE2b-256 1a0f9e8124caa914df3a67ac7651a86c98de2dfe8fa7caf697f89cb55128795b

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.2-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page