Skip to main content

Create Numpy NPY files that are larger than the main memory

Project description

NpyAppendArray

Create Numpy NPY files by appending on the zero axis. The main application is to efficiently create arrays which are larger than the main memory:

  1. Embedded devices might have limited memory
  2. Certain workflows (e.g. Deep Learning) may require to handle large amounts of data

After creation, the file can then be read with memory mapping, e.g. by adding mmap_mode="r".

Installation

conda install -c conda-forge npy-append-array

or

pip install npy-append-array

Usage

from npy_append_array import NpyAppendArray
import numpy as np

arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[1,2],[3,4],[5,6]])

filename = 'out.npy'

with NpyAppendArray(filename) as npaa:
    npaa.append(arr1)
    npaa.append(arr2)
    npaa.append(arr2)
    
data = np.load(filename, mmap_mode="r")

print(data)

Implementation Details

Appending to an array created by np.save might be possible under certain circumstances, since the .npy total header byte count is required to be evenly divisible by 64. Thus, there might be some spare space to grow the shape entry in the array descriptor. However, this is not guaranteed and might randomly fail. Initialize the array with NpyAppendArray(filename) directly (see above) so the header will be created with 64 byte of spare header space for growth.

Will 64 byte extra header space cover my needs?

It allows for up to 10^64 >= 2^212 array entries or data bits. Indeed, this is less than the number of atoms in the universe. However, fully populating such an array, due to limits imposed by quantum mechanics, would require more energy than would be needed to boil the oceans, compare

https://hbfs.wordpress.com/2009/02/10/to-boil-the-oceans

Therefore, the extra header space might cover your needs.

Limitations

  1. Only tested with Linux. For Windows, consider using WSL (version 2 or above).
  2. NotImplementedError thrown when Fortran order is used.
  3. NPY version 3 is unsupported because there is no numpy.lib.format.read_array_header_3_0 function defined in https://numpy.org/devdocs/reference/generated/numpy.lib.format.html
  4. Just like with numpy.load/numpy.save, multithreaded read/write is unsupported

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

npy-append-array-0.9.13.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

npy_append_array-0.9.13-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file npy-append-array-0.9.13.tar.gz.

File metadata

  • Download URL: npy-append-array-0.9.13.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for npy-append-array-0.9.13.tar.gz
Algorithm Hash digest
SHA256 4860a8bcafc9a2a6144d4103b34ed0876e469e844295749cfd5aa5a22bbdebb4
MD5 bf9724436f0c6e39551d7fe42189d123
BLAKE2b-256 60aaaff8f6df217a23f80db28f4f5ab42a2ee957e9999f9b55b156466f5f80c4

See more details on using hashes here.

File details

Details for the file npy_append_array-0.9.13-py3-none-any.whl.

File metadata

  • Download URL: npy_append_array-0.9.13-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for npy_append_array-0.9.13-py3-none-any.whl
Algorithm Hash digest
SHA256 55a27f6c1d5a774db07fcba3eec9f6f5d500ad4cf9a56988a794af535c9f0d57
MD5 533c6c561ce39e8c8be56f4a9c3ef56c
BLAKE2b-256 82c7892cb3e54f179f1a6d418db25e278423ae1a48d00b0e418a12b471ea89ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page