Create Numpy NPY files that are larger than the main memory
Project description
NpyAppendArray
Create Numpy NPY files by appending on the zero axis. The main application is to efficiently create arrays which are larger than the main memory:
- Embedded devices might have limited memory
- Certain workflows (e.g. Deep Learning) may require to handle large amounts of data
After creation, the file can then be read with memory mapping, e.g. by adding
mmap_mode="r"
.
Installation
conda install -c conda-forge npy-append-array
or
pip install npy-append-array
Usage
from npy_append_array import NpyAppendArray
import numpy as np
arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[1,2],[3,4],[5,6]])
filename = 'out.npy'
with NpyAppendArray(filename) as npaa:
npaa.append(arr1)
npaa.append(arr2)
npaa.append(arr2)
data = np.load(filename, mmap_mode="r")
print(data)
Implementation Details
Appending to an array created by np.save might be possible under certain circumstances, since the .npy total header byte count is required to be evenly divisible by 64. Thus, there might be some spare space to grow the shape entry in the array descriptor. However, this is not guaranteed and might randomly fail. Initialize the array with NpyAppendArray(filename) directly (see above) so the header will be created with 64 byte of spare header space for growth.
Will 64 byte extra header space cover my needs?
It allows for up to 10^64 >= 2^212 array entries or data bits. Indeed, this is less than the number of atoms in the universe. However, fully populating such an array, due to limits imposed by quantum mechanics, would require more energy than would be needed to boil the oceans, compare
https://hbfs.wordpress.com/2009/02/10/to-boil-the-oceans
Therefore, the extra header space might cover your needs.
Limitations
- Only tested with Linux. For Windows, consider using WSL (version 2 or above).
- NotImplementedError thrown when Fortran order is used.
- NPY version 3 is unsupported because there is no
numpy.lib.format.read_array_header_3_0
function defined in https://numpy.org/devdocs/reference/generated/numpy.lib.format.html - Just like with numpy.load/numpy.save, multithreaded read/write is unsupported
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for npy_append_array-0.9.12-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 537b33c777d504f5bcce2216ea5a80ef0f075f16dc83f69a68face6fcf856824 |
|
MD5 | 4b67625bd5294e297d4d0f7d33693ea5 |
|
BLAKE2b-256 | a59f1a2ad423e5b91bd682bb688c13fea3facf3a8aad66cd7af0fee0b508b7d1 |