Skip to main content

A high-performance array storage and manipulation library

Project description

NumPack

NumPack is a high-performance array storage and manipulation library designed to efficiently handle large NumPy arrays. Built with Rust for performance and exposed to Python through PyO3, NumPack provides a seamless interface for storing, loading, and manipulating large numerical arrays with better performance compared to traditional NumPy storage methods.

Features

  • High Performance: Optimized for both reading and writing large numerical arrays
  • Memory Mapping Support: Efficient memory usage through memory mapping capabilities
  • Selective Loading: Load only the arrays you need, when you need them
  • In-place Operations: Support for in-place array modifications without full file rewrite
  • Parallel I/O: Utilizes parallel processing for improved performance
  • Multiple Data Types: Supports various numerical data types including:
    • Boolean
    • Unsigned integers (8-bit to 64-bit)
    • Signed integers (8-bit to 64-bit)
    • Floating point (32-bit and 64-bit)

Installation

pip install numpack

Requirements

  • Python >= 3.10
  • NumPy

Usage

Basic Operations

import numpy as np
from numpack import NumPack

# Create a NumPack instance
npk = NumPack("data_directory")

# Save arrays
arrays = {
    'array1': np.random.rand(1000, 100).astype(np.float32),
    'array2': np.random.rand(500, 200).astype(np.float32)
}
npk.save(arrays)

# Load arrays
# Normal mode
loaded = npk.load("array1")

# Memory mapping mode for large arrays
with npk.mmap_mode() as mmap_npk:
   # Access specific arrays
   array1 = mmap_npk.load('array1')
   array2 = mmap_npk.load('array2')

Advanced Operations

# Replace specific rows
replacement = np.random.rand(10, 100).astype(np.float32)
npk.replace({'array1': replacement}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])  # Using list indices
npk.replace({'array1': replacement}, slice(0, 10))  # Using slice notation

# Append new arrays
new_arrays = {
    'array3': np.random.rand(200, 100).astype(np.float32)
}
npk.append(new_arrays)

# Drop arrays or specific rows
npk.drop('array1')  # Drop entire array
npk.drop(['array1', 'array2'])  # Drop multiple arrays
npk.drop('array2', [0, 1, 2])  # Drop specific rows

# Random access operations
data = npk.getitem('array1', [0, 1, 2])  # Access specific rows
data = npk.getitem('array1', slice(0, 10))  # Access using slice
data = npk['array1']  # Dictionary-style access for entire array

# Metadata operations
shapes = npk.get_shape()  # Get shapes of all arrays
shapes = npk.get_shape('array1')  # Get shape of specific array
members = npk.get_member_list()  # Get list of array names
mtime = npk.get_modify_time('array1')  # Get modification time
metadata = npk.get_metadata()  # Get complete metadata

# Stream loading for large arrays
for batch in npk.stream_load('array1', buffer_size=1000):
    # Process 1000 rows at a time
    process_batch(batch)

# Reset/clear storage
npk.reset()  # Clear all arrays

# Iterate over all arrays
for array_name in npk:
    data = npk[array_name]
    print(f"{array_name} shape: {data.shape}")

Memory Mapping Mode

For large arrays, memory mapping mode provides more efficient memory usage:

# Using memory mapping mode
with npk.mmap_mode() as mmap_npk:
    # Access specific arrays
    array1 = mmap_npk.load('array1')  # Array is not fully loaded into memory
    array2 = mmap_npk.load('array2')
    
    # Perform operations on memory-mapped arrays
    result = array1[0:1000] + array2[0:1000]

Performance Optimization Tips

  1. Batch Operations:

    • Prefer batch replacements over row-by-row operations when modifying multiple rows
    • Use stream_load for processing large arrays to control memory usage
  2. Memory Management:

    • Use memory mapping mode for large arrays
    • Release array references when no longer needed
  3. Storage Optimization:

    • Organize data structures efficiently to minimize modification frequency
    • Use reset() appropriately to clean up unnecessary data

Performance

NumPack offers significant performance improvements compared to traditional NumPy storage methods, especially in data modification operations and random access. Below are detailed benchmark results:

Benchmark Results

The following benchmarks were performed on an MacBook Pro (M1, 2020, 32GB Memory) with arrays of size 1M x 10 and 500K x 5 (float32).

Storage Operations

Operation NumPack NumPy NPZ NumPy NPY
Save 0.014s (0.93x NPZ, 0.57x NPY) 0.013s 0.008s
Full Load 0.008s (1.75x NPZ, 1.00x NPY) 0.014s 0.008s
Selective Load 0.005s (2.00x NPZ, -) 0.010s -
Mmap Load 0.006s (2.17x NPZ, 0.00x NPY) 0.013s 0.000s

Data Modification Operations

Operation NumPack NumPy NPZ NumPy NPY
Single Row Replace 0.000s (23.00x NPZ, 14.00x NPY) 0.023s 0.014s
Continuous Rows (10K) 0.001s (23.00x NPZ, 12.00x NPY) 0.023s 0.012s
Random Rows (10K) 0.015s (1.53x NPZ, 0.87x NPY) 0.023s 0.013s
Large Data Replace (500K) 0.019s (1.16x NPZ, 0.79x NPY) 0.022s 0.015s

Drop Operations

Operation NumPack NumPy NPZ NumPy NPY
Drop Array 0.001s (24.00x NPZ, 1.00x NPY) 0.024s 0.001s
Drop Rows (500K) 0.036s (1.36x NPZ, 0.86x NPY) 0.049s 0.031s

Append Operations

Operation NumPack NumPy NPZ
Append 0.003s (5.33x NPZ) 0.016s

Random Access Performance (10K indices)

Operation NumPack NumPy NPZ NumPy NPY
Random Access 0.008s (1.88x NPZ, 1.13x NPY) 0.015s 0.009s

File Size Comparison

Format Size Ratio
NumPack 47.68 MB 1.0x
NPZ 47.68 MB 1.0x
NPY 47.68 MB 1.0x

Key Performance Highlights

  1. Data Modification:

    • Single row replacement: NumPack is 23x faster than NPZ and 14x faster than NPY
    • Continuous rows: NumPack is 23x faster than NPZ and 12x faster than NPY
    • Random rows: NumPack is 1.53x faster than NPZ but 0.87x slower than NPY
    • Large data replacement: NumPack is 1.16x faster than NPZ but 0.79x slower than NPY
  2. Drop Operations:

    • Drop array: NumPack is 24x faster than NPZ and comparable to NPY
    • Drop rows: NumPack is 1.36x faster than NPZ but 0.86x slower than NPY
    • NumPack provides efficient in-place row deletion without full file rewrite
  3. Loading Performance:

    • Full load: NumPack is 1.75x faster than NPZ and comparable to NPY
    • Memory-mapped load: NumPack is 2.17x faster than NPZ but slower than NPY
    • Selective load: NumPack is 2.00x faster than NPZ
  4. Random Access:

    • NumPack is 1.88x faster than NPZ and 1.13x faster than NPY for random index access
  5. Storage Efficiency:

    • All formats achieve identical compression ratios (47.68 MB)
    • NumPack maintains high performance while keeping file sizes competitive

Note: All benchmarks were performed with float32 arrays. Performance may vary depending on data types, array sizes, and system configurations. Numbers greater than 1.0x indicate faster performance, while numbers less than 1.0x indicate slower performance.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Copyright 2024 NumPack Contributors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

numpack-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (543.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

numpack-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl (560.3 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

numpack-0.1.1-cp312-cp312-win_amd64.whl (406.3 kB view details)

Uploaded CPython 3.12Windows x86-64

numpack-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl (614.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

numpack-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (543.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

numpack-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl (560.3 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

numpack-0.1.1-cp311-cp311-win_amd64.whl (405.0 kB view details)

Uploaded CPython 3.11Windows x86-64

numpack-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl (616.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

numpack-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (543.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

numpack-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl (561.0 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

numpack-0.1.1-cp310-cp310-win_amd64.whl (404.9 kB view details)

Uploaded CPython 3.10Windows x86-64

numpack-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl (616.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

numpack-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (543.1 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

numpack-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl (560.9 kB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file numpack-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c3ca3ff638fa6cc8f105192c0ed20cdebe90af497be21c3aa5c9998669630ad1
MD5 07c58486419aeb0f7df80a6e86ac5913
BLAKE2b-256 ab1581d60845c63e0cfc01608c7092b665766578aaf06e61ffc385b7ba280b26

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4c42d8b9ecd96827329cc50aaeb113707665ba4dc92f1e83970bc7f6e7f15b81
MD5 8c19e4466f125e9d519276b5da611734
BLAKE2b-256 6aaebf89859b8b45578031c5c468c1a99f9f722e8f2a3f59c4e2f4f85cd785e8

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: numpack-0.1.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 406.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for numpack-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 24746d8f033fb948c616efe6ac224a145d1c582e3a8313ccb0d06841c7e0b859
MD5 45422cad9240c10e9f4a17354bb5392f
BLAKE2b-256 52042aca9f7748207be87850698731abdb6625acc7d7c38098d9e95297a742a1

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1074750cf235b887d8663905405421d0ff26562ab908bee211bd13dd23762b59
MD5 7650245ddd75881a3bb76eece0aae18f
BLAKE2b-256 9428ce5ace418d6a9d59030465027e5af4bb9367785025de2fc204d740e01640

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9b7979f2a449f2cc6f0bdbdb84f95b2aa94cd76f63229981ffd597f78eeef03f
MD5 79fd48ab43e85fe937649ece248a7690
BLAKE2b-256 41bdcbd9db26329d83b6d7e5bcfaa4b2fff04869c2d5f193473c3c38db8bc130

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2054f7de325e3f2c6a1ebc7017007e5f8c535c088ab8ecc780cad44a96b809e3
MD5 e1bd6cc0323391b551d9220b3ae2d63c
BLAKE2b-256 d8798a0009038222be4cc7184eb2d454bbaf4ae3e67968abf2d8f6389ffb7e01

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: numpack-0.1.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 405.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for numpack-0.1.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 669253a7e99a261f77261357baae3da38fb264d74babe5051b993f39a24bd4d8
MD5 6c563fab08367ff33180097ac2135f7d
BLAKE2b-256 334039eb67b7b0568d3c49fa1953ba2373414d2f1fdac81ccb341ae078808087

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 63a5c20dbecdfb83aa26eda0d3a439ec24a662afde9cab9a892286738666d9b1
MD5 b3723b7444fc818d9bf0b8ee8b5c6e04
BLAKE2b-256 04ac1ce5b15eafc795feb9c2bfb954b7acbfe0f17b9a568a6be49bb8616ed202

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ea99b349c119bb7ae6372e73caa96e218a7985ad4a0bfc3d19b15d093362308d
MD5 95c2a47659a89bd91f7eb3d7534f857a
BLAKE2b-256 0a7702598a484461a0c284cadfc461361230d1174eebfdc66182c03b906ce15b

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1a82df93384f52403935ddc34ae7325c24e02fb0fcc1f61857ed9202c6b2a4f0
MD5 fdb445bcda8f15cc85231d7d7003c710
BLAKE2b-256 0e8761b5fee309380f4657f5477c4d51efc9c44fb4a110b72f42fc52394e1598

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: numpack-0.1.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 404.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for numpack-0.1.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 65ce80032acb4b0ba05c4757853a671caeb01835575d84089e58f9af25a6535d
MD5 482c3c9dff5867ab9284f3015636bdc0
BLAKE2b-256 e98d9345c9f6c5ffbc24ea574a1d112095dd8b4f67e40b12957e67d487b6b811

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c7f400ffbf951c9076f58d8836e36b1b903f45111af1b6c02f0085909c7a2a36
MD5 cf114aaa7f2ae88e091322a21e8201e4
BLAKE2b-256 8e0e2896df52877894b34b0516b8eb5ed0960541ba133c384613256bccfa428a

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ab103d18961ef27c8ddd7e85509be38c63785244e7df38098be7b44c948177cd
MD5 214b639d3d339b15b2cd547742b69bd0
BLAKE2b-256 7de235fbc214e3cdeaf75863c3687d714863b25cddf6dce79440a6be22b077d7

See more details on using hashes here.

File details

Details for the file numpack-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0c374b453cf0e3a5ef2151a67d2792d9f083230226e7bc599e86b37c5373ba45
MD5 86057ad96c697cbbb29bb817a67df8e5
BLAKE2b-256 44bf262362daf2e777b991d9db4c8812fd2f5a48d5f400b4dad423c53c74090e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page