Functions to efficiently rechunk multidimensional arrays
Project description
rechunkit
Functions to efficiently rechunk multidimensional arrays
Documentation: https://mullenkamp.github.io/rechunkit/
Source Code: https://github.com/mullenkamp/rechunkit
Introduction
Rechunkit is a Python library for efficiently rechunking multidimensional numpy arrays stored as chunks. It uses a generator-based approach for on-the-fly rechunking without requiring the full target array in memory.
Key Features
- Efficient On-the-Fly Rechunking: Uses Python generators to yield rechunked data without requiring the full target array to be stored in memory.
- Memory-Aware Optimization: Employs a smart scaling algorithm to maximize performance within a user-defined memory limit (
max_mem). - LCM Minimization: Utilizes highly composite numbers for chunk guessing to minimize the Least Common Multiple (LCM) between source and target, significantly reducing redundant reads.
- Flexible Data Access: Supports subset selection (
sel) and works with any source that implements a numpy__getitem__style callable (method or function). - Preprocessing Utilities: Includes tools for estimating ideal chunk shapes, calculating memory requirements, and predicting the number of required read operations.
Installation
pip install rechunkit
Quick Example
import numpy as np
from math import prod
from rechunkit import rechunker
shape = (31, 31, 31)
dtype = np.dtype('int32')
source_data = np.arange(1, prod(shape) + 1, dtype=dtype).reshape(shape)
source = source_data.__getitem__
target = np.zeros(shape, dtype=dtype)
for write_chunk, data in rechunker(source, shape, dtype, (5, 2, 4), (4, 5, 3), max_mem=2000):
target[write_chunk] = data
assert np.all(source_data == target)
See the documentation for detailed guides, integration examples (h5py, zarr), and the full API reference.
License
This project is licensed under the terms of the Apache Software License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rechunkit-0.5.0.tar.gz.
File metadata
- Download URL: rechunkit-0.5.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e982af84bb801d6426e60706e86765522f1ffb2def7493f616ba339f190f290
|
|
| MD5 |
9f82d578ff2921188484c7481e1b032b
|
|
| BLAKE2b-256 |
b4a73012bec981ab9a6632f3b86986157af54b48d6a0391c2f696063483ebd4c
|
File details
Details for the file rechunkit-0.5.0-py3-none-any.whl.
File metadata
- Download URL: rechunkit-0.5.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9514039f0a209984382da6f8d1655260cff96444a86bf5c2ef60fc217ed523a
|
|
| MD5 |
a0b1f42616a759389d68f0da4ded9051
|
|
| BLAKE2b-256 |
962bad9a9845e90c0ec8f41c5e86f6c0ff1cee4e6f75f7f729b1a7092ae3540a
|