Skip to main content

A CodecPipeline for zarr-python backed by the zarrs Rust crate

Project description

zarrs-python

PyPI Downloads Downloads Stars CI CD

This project serves as a bridge between zarrs (Rust) and zarr (zarr-python) via PyO3. The main goal of the project is to speed up i/o (see zarr_benchmarks).

To use the project, simply install our package (which depends on zarr-python>=3.0.0), and run:

import zarr
zarr.config.set({"codec_pipeline.path": "zarrs.ZarrsCodecPipeline"})

You can then use your zarr as normal (with some caveats)!

API

We export a ZarrsCodecPipeline class so that zarr-python can use the class but it is not meant to be instantiated and we do not guarantee the stability of its API beyond what is required so that zarr-python can use it. Therefore, it is not documented here.

At the moment, we only support a subset of the zarr-python stores:

A NotImplementedError will be raised if a store is not supported.

Configuration

ZarrsCodecPipeline options are exposed through zarr.config.

Standard zarr.config options control some functionality (see the defaults in the config.py of zarr-python):

  • threading.max_workers: the maximum number of threads used internally by the ZarrsCodecPipeline on the Rust side.
  • array.write_empty_chunks: whether or not to store empty chunks.
    • Defaults to false if None. Note that checking for emptiness has some overhead, see here for more info.

The ZarrsCodecPipeline specific options are:

  • codec_pipeline.chunk_concurrent_maximum: the maximum number of chunks stored/retrieved concurrently.
    • Defaults to the number of logical CPUs if None. It is constrained by threading.max_workers as well.
  • codec_pipeline.chunk_concurrent_minimum: the minimum number of chunks retrieved/stored concurrently when balancing chunk/codec concurrency.
    • Defaults to 4 if None. See here for more info.
  • codec_pipeline.validate_checksums: enable checksum validation (e.g. with the CRC32C codec).
    • Defaults to True. See here for more info.
  • codec_pipeline.direct_io: enable O_DIRECT read/write, needs support from the operating system (currently only Linux) and file system.
    • Defaults to False.
  • codec_pipeline.strict: raise exceptions for unsupported operations instead of falling back to the default codec pipeline of zarr-python.
    • Defaults to False.

For example:

zarr.config.set({
    "threading.max_workers": None,
    "array.write_empty_chunks": False,
    "codec_pipeline": {
        "path": "zarrs.ZarrsCodecPipeline",
        "validate_checksums": True,
        "chunk_concurrent_maximum": None,
        "chunk_concurrent_minimum": 4,
        "direct_io": False,
        "strict": False
    }
})

If the ZarrsCodecPipeline is pickled, and then un-pickled, and during that time one of chunk_concurrent_minimum, chunk_concurrent_maximum, or num_threads has changed, the newly un-pickled version will pick up the new value. However, once a ZarrsCodecPipeline object has been instantiated, these values are then fixed. This may change in the future as guidance from the zarr community becomes clear.

Concurrency

Concurrency can be classified into two types:

  • chunk (outer) concurrency: the number of chunks retrieved/stored concurrently.
    • This is chosen automatically based on various factors, such as the chunk size and codecs.
    • It is constrained between codec_pipeline.chunk_concurrent_minimum and codec_pipeline.chunk_concurrent_maximum for operations involving multiple chunks.
  • codec (inner) concurrency: the number of threads encoding/decoding a chunk.
    • This is chosen automatically in combination with the chunk concurrency.

The product of the chunk and codec concurrency will approximately match threading.max_workers.

Chunk concurrency is typically favored because:

  • parallel encoding/decoding can have a high overhead with some codecs, especially with small chunks, and
  • it is advantageous to retrieve/store multiple chunks concurrently, especially with high latency stores.

zarrs-python will often favor codec concurrency with sharded arrays, as they are well suited to codec concurrency.

Supported Indexing Methods

The following methods will trigger use with the old zarr-python pipeline:

  1. Any oindex or vindex integer np.ndarray indexing with dimensionality >=3 i.e.,

    arr[np.array([...]), :, np.array([...])]
    arr[np.array([...]), np.array([...]), np.array([...])]
    arr[np.array([...]), np.array([...]), np.array([...])] = ...
    arr.oindex[np.array([...]), np.array([...]), np.array([...])] = ...
    
  2. Any vindex or oindex discontinuous integer np.ndarray indexing for writes in 2D

    arr[np.array([0, 5]), :] = ...
    arr.oindex[np.array([0, 5]), :] = ...
    
  3. vindex writes in 2D where both indexers are integer np.ndarray indices i.e.,

    arr[np.array([...]), np.array([...])] = ...
    
  4. Ellipsis indexing. We have tested some, but others fail even with zarr-python's default codec pipeline. Thus for now we advise proceeding with caution here.

    arr[0:10, ..., 0:5]
    

Furthermore, using anything except contiguous (i.e., slices or consecutive integer) np.ndarray for numeric data will fall back to the default zarr-python implementation.

Please file an issue if you believe we have more holes in our coverage than we are aware of or you wish to contribute! For example, we have an issue in zarrs for integer-array indexing that would unblock a lot the use of the rust pipeline for that use-case (very useful for mini-batch training perhaps!).

Further, any codecs not supported by zarrs will also automatically fall back to the python implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zarrs-0.2.3.tar.gz (64.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

zarrs-0.2.3-cp311-abi3-win_amd64.whl (5.9 MB view details)

Uploaded CPython 3.11+Windows x86-64

zarrs-0.2.3-cp311-abi3-musllinux_1_2_x86_64.whl (13.0 MB view details)

Uploaded CPython 3.11+musllinux: musl 1.2+ x86-64

zarrs-0.2.3-cp311-abi3-musllinux_1_2_armv7l.whl (12.2 MB view details)

Uploaded CPython 3.11+musllinux: musl 1.2+ ARMv7l

zarrs-0.2.3-cp311-abi3-musllinux_1_2_aarch64.whl (12.5 MB view details)

Uploaded CPython 3.11+musllinux: musl 1.2+ ARM64

zarrs-0.2.3-cp311-abi3-manylinux_2_28_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ x86-64

zarrs-0.2.3-cp311-abi3-manylinux_2_28_ppc64le.whl (6.7 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ppc64le

zarrs-0.2.3-cp311-abi3-manylinux_2_28_armv7l.whl (6.0 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARMv7l

zarrs-0.2.3-cp311-abi3-manylinux_2_28_aarch64.whl (6.1 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARM64

zarrs-0.2.3-cp311-abi3-macosx_11_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

zarrs-0.2.3-cp311-abi3-macosx_10_12_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file zarrs-0.2.3.tar.gz.

File metadata

  • Download URL: zarrs-0.2.3.tar.gz
  • Upload date:
  • Size: 64.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zarrs-0.2.3.tar.gz
Algorithm Hash digest
SHA256 61640dbbffb9a0b0ebd73f970ce97b52ef56df2828c2809058016d76da59ee60
MD5 a001490cca932225f1a3af4cda365a7b
BLAKE2b-256 f9b39e088d4ab5c971e5d2b52cd4d58e3acce35acb3e131990fdc28b69366233

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3.tar.gz:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: zarrs-0.2.3-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 72eb1f5c4ca8382cb9e38dd98a48a0e484170d703152110f32a39520c7fa570d
MD5 494bfdf71bf5e711215b5931319513f9
BLAKE2b-256 7fc1be4e37d80a95347334c287cbb42d94c6181d447f1624c0c5354f593e1fda

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-win_amd64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 437dd4fcf74607480361f401f15b47416aa69f0ff4379c4ea330c453b7e05098
MD5 3d0caa1c03aa8c5870fe1aba745ce8fb
BLAKE2b-256 6c90ca544236092ab4803d1c3c88ac7b143885e280a63954d454d60885784af8

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-musllinux_1_2_x86_64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 991556a7589e93bc5445da2b97e0c89d7d871e539b9ef28dae857b8573c65f5c
MD5 200a26e1b425d6169ff4ca811dbe245f
BLAKE2b-256 e3857ad323d428540ca7add343ade347841d181e4e3d73a69f39e34e447f0acc

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-musllinux_1_2_armv7l.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 9db202e95c3b5c9116afdfebf9912e1faa5ab60e6a1982e0406953cdb47bec38
MD5 e7f3ae00e1afa6679a02b4389f95d241
BLAKE2b-256 a96a7a4230676bd66c0181b4e9000bec30deee1b1695557e5d245514f0454103

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-musllinux_1_2_aarch64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ab4074056f01f3292c89cd769e8c0db92c0df076e3d36665eec7fc557a62a2ed
MD5 8de052528b7a7637f47c8eada18cdb4b
BLAKE2b-256 c5fa471e2511b0c77419ac2228ce72770e94e994ab99c6b9275cb3de1dcead2d

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-manylinux_2_28_x86_64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 1d387b75a19c31795cb2a81ef973c905c2c04ca3b1a4cca4bc84c81050974827
MD5 50cccad87398830fb4e9d80b12c22457
BLAKE2b-256 3fc10aba516796af22be08e82e37ded59f46cc7ffabf6932957455fccb9c6109

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-manylinux_2_28_ppc64le.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-manylinux_2_28_armv7l.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-manylinux_2_28_armv7l.whl
Algorithm Hash digest
SHA256 59a29dfdea088bb25c1e9b5107cbb8de15c8d571d51484ff128cd526c40521b9
MD5 ae828971b5a952b884474a1cbd6672f1
BLAKE2b-256 59a928b91493c7db9f3db191a1bc396cd2e212559536f2bc7325e5d5cdbb8b53

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-manylinux_2_28_armv7l.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 19b194f80139b838bb4bf18ed6ef93ecb1904717a04695fbc50cdc0c6074f282
MD5 9525470496d3a2039ba0f934fe015f15
BLAKE2b-256 566327f9f7784006a900ffaa3d62d5c4d0dde98821683cd298cad79f66aa25c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-manylinux_2_28_aarch64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e6998bf1a61cd7c4afd3c263130317c1001599b37ff6f27082cc900a0ad48baa
MD5 65175ba6773093bce8a6598401137d84
BLAKE2b-256 80ad8a8525a72190db2c8d6807c69695ef0ea959fd50a4ac887af80803ff5487

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zarrs-0.2.3-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for zarrs-0.2.3-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b9470b17629961badf4261fb0d26ad5fcbe316b63c1b00fb0489a51c3f8ef157
MD5 26a74a5b4f6356a341b80b31da6ad334
BLAKE2b-256 4ac0e10e618293351247e948527c0d2b4c3d8fa9f7478e9f8e945755fc47ecdc

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrs-0.2.3-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: cd.yml on zarrs/zarrs-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page