zarr · PyPI

A minimal implementation of chunked, compressed, N-dimensional arrays for Python.

These details have been verified by PyPI

Owner

Zarr

Maintainers

aliman jakirkham rabernat zarr_dev

These details have not been verified by PyPI

Project links

Homepage

Project description

A minimal implementation of chunked, compressed, N-dimensional arrays for Python.

Source code: https://github.com/alimanfoo/zarr
Download: https://pypi.python.org/pypi/zarr

Installation

Installation currently requires NumPy and Cython pre-installed. Currently only compatible with Python >= 3.4.

Install from PyPI:

$ pip install -U zarr

Install from GitHub:

$ pip install -U git+https://github.com/alimanfoo/zarr.git@master

Status

Highly experimental, pre-alpha. Bug reports and pull requests very welcome.

Design goals

Chunking in multiple dimensions
Resize any dimension
Concurrent reads
Concurrent writes
Release the GIL during compression and decompression

Usage

Create an array:

>>> import numpy as np
>>> import zarr
>>> z = zarr.empty((10000, 1000), dtype='i4', chunks=(1000, 100))
>>> z
zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1)
  nbytes: 38.1M; cbytes: 0

Fill it with some data:

>>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z
zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1)
  nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3

Obtain a NumPy array by slicing:

>>> z[:]
array([[      0,       1,       2, ...,     997,     998,     999],
       [   1000,    1001,    1002, ...,    1997,    1998,    1999],
       [   2000,    2001,    2002, ...,    2997,    2998,    2999],
       ...,
       [9997000, 9997001, 9997002, ..., 9997997, 9997998, 9997999],
       [9998000, 9998001, 9998002, ..., 9998997, 9998998, 9998999],
       [9999000, 9999001, 9999002, ..., 9999997, 9999998, 9999999]], dtype=int32)
>>> z[:100]
array([[    0,     1,     2, ...,   997,   998,   999],
       [ 1000,  1001,  1002, ...,  1997,  1998,  1999],
       [ 2000,  2001,  2002, ...,  2997,  2998,  2999],
       ...,
       [97000, 97001, 97002, ..., 97997, 97998, 97999],
       [98000, 98001, 98002, ..., 98997, 98998, 98999],
       [99000, 99001, 99002, ..., 99997, 99998, 99999]], dtype=int32)
>>> z[:, :100]
array([[      0,       1,       2, ...,      97,      98,      99],
       [   1000,    1001,    1002, ...,    1097,    1098,    1099],
       [   2000,    2001,    2002, ...,    2097,    2098,    2099],
       ...,
       [9997000, 9997001, 9997002, ..., 9997097, 9997098, 9997099],
       [9998000, 9998001, 9998002, ..., 9998097, 9998098, 9998099],
       [9999000, 9999001, 9999002, ..., 9999097, 9999098, 9999099]], dtype=int32)

Resize the array and add more data:

>>> z.resize(20000, 1000)
>>> z
zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1)
  nbytes: 76.3M; cbytes: 2.0M; ratio: 38.5
>>> z[10000:, :] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z
zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1)
  nbytes: 76.3M; cbytes: 4.0M; ratio: 19.3

For convenience, an append() method is also available, which can be used to append data to any axis:

>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z
zarr.ext.Array((10000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1)
  nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3
>>> z.append(a+a)
>>> z
zarr.ext.Array((20000, 1000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1)
  nbytes: 76.3M; cbytes: 3.6M; ratio: 21.2
>>> z.append(np.vstack([a, a]), axis=1)
>>> z
zarr.ext.Array((20000, 2000), int32, chunks=(1000, 100), cname='blosclz', clevel=5, shuffle=1)
  nbytes: 152.6M; cbytes: 7.6M; ratio: 20.2

Tuning

zarr is designed for use in parallel computations working chunk-wise over data. Try it with dask.array.

zarr is optimised for accessing and storing data in contiguous slices, of the same size or larger than chunks. It is not and will never be optimised for single item access.

Chunks sizes >= 1M are generally good. Optimal chunk shape will depend on the correlation structure in your data.

Acknowledgments

zarr uses c-blosc internally for compression and decompression and borrows code heavily from bcolz.

Project details

These details have been verified by PyPI

Owner

Zarr

Maintainers

aliman jakirkham rabernat zarr_dev

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.1.5

Nov 21, 2025

3.1.4

Nov 21, 2025

3.1.3

Sep 18, 2025

3.1.2

Aug 25, 2025

3.1.1

Jul 30, 2025

3.1.0

Jul 15, 2025

3.0.10

Jul 3, 2025

3.0.9

Jul 1, 2025

3.0.8

May 19, 2025

3.0.7 yanked

Apr 21, 2025