Skip to main content

Functions to efficiently rechunk multidimensional arrays

Project description

rechunkit

Functions to efficiently rechunk multidimensional arrays

build codecov PyPI version


Documentation: https://mullenkamp.github.io/rechunkit/

Source Code: https://github.com/mullenkamp/rechunkit


Introduction

Rechunkit is a Python library for efficiently rechunking multidimensional numpy arrays stored as chunks. It uses a generator-based approach for on-the-fly rechunking without requiring the full target array in memory.

Key Features

  • Efficient On-the-Fly Rechunking: Uses Python generators to yield rechunked data without requiring the full target array to be stored in memory.
  • Memory-Aware Optimization: Employs a smart scaling algorithm to maximize performance within a user-defined memory limit (max_mem).
  • LCM Minimization: Utilizes highly composite numbers for chunk guessing to minimize the Least Common Multiple (LCM) between source and target, significantly reducing redundant reads.
  • Flexible Data Access: Supports subset selection (sel) and works with any source that implements a numpy __getitem__ style callable (method or function).
  • Preprocessing Utilities: Includes tools for estimating ideal chunk shapes, calculating memory requirements, and predicting the number of required read operations.

Installation

pip install rechunkit

Quick Example

import numpy as np
from math import prod
from rechunkit import rechunker

shape = (31, 31, 31)
dtype = np.dtype('int32')
source_data = np.arange(1, prod(shape) + 1, dtype=dtype).reshape(shape)
source = source_data.__getitem__

target = np.zeros(shape, dtype=dtype)
for write_chunk, data in rechunker(source, shape, dtype, (5, 2, 4), (4, 5, 3), max_mem=2000):
    target[write_chunk] = data

assert np.all(source_data == target)

See the documentation for detailed guides, integration examples (h5py, zarr), and the full API reference.

License

This project is licensed under the terms of the Apache Software License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rechunkit-0.5.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rechunkit-0.5.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file rechunkit-0.5.0.tar.gz.

File metadata

  • Download URL: rechunkit-0.5.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for rechunkit-0.5.0.tar.gz
Algorithm Hash digest
SHA256 1e982af84bb801d6426e60706e86765522f1ffb2def7493f616ba339f190f290
MD5 9f82d578ff2921188484c7481e1b032b
BLAKE2b-256 b4a73012bec981ab9a6632f3b86986157af54b48d6a0391c2f696063483ebd4c

See more details on using hashes here.

File details

Details for the file rechunkit-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: rechunkit-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.7

File hashes

Hashes for rechunkit-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9514039f0a209984382da6f8d1655260cff96444a86bf5c2ef60fc217ed523a
MD5 a0b1f42616a759389d68f0da4ded9051
BLAKE2b-256 962bad9a9845e90c0ec8f41c5e86f6c0ff1cee4e6f75f7f729b1a7092ae3540a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page