Skip to main content

A high-performance array storage and manipulation library

Project description

NumPack

NumPack is a high-performance array storage and manipulation library designed to efficiently handle large NumPy arrays. Built with Rust for performance and exposed to Python through PyO3, NumPack provides a seamless interface for storing, loading, and manipulating large numerical arrays with better performance compared to traditional NumPy storage methods.

Features

  • High Performance: Optimized for both reading and writing large numerical arrays
  • Memory Mapping Support: Efficient memory usage through memory mapping capabilities
  • Selective Loading: Load only the arrays you need, when you need them
  • In-place Operations: Support for in-place array modifications without full file rewrite
  • Parallel I/O: Utilizes parallel processing for improved performance
  • Multiple Data Types: Supports various numerical data types including:
    • Boolean
    • Unsigned integers (8-bit to 64-bit)
    • Signed integers (8-bit to 64-bit)
    • Floating point (32-bit and 64-bit)

Installation

pip install numpack

Requirements

  • Python >= 3.9
  • NumPy

Usage

Basic Operations

import numpy as np
from numpack import NumPack

# Create a NumPack instance
npk = NumPack("data_directory")

# Save arrays
arrays = {
    'array1': np.random.rand(1000, 100).astype(np.float32),
    'array2': np.random.rand(500, 200).astype(np.float32)
}
npk.save(arrays)

# Load arrays
# Normal mode
loaded = npk.load(mmap_mode=False)

# Memory mapping mode for large arrays
lazy_loaded = npk.load(mmap_mode=True)

# Access specific arrays
array1 = loaded['array1']
array2 = loaded['array2']

Advanced Operations

# Replace specific rows
replacement = np.random.rand(10, 100).astype(np.float32)
npk.replace({'array1': replacement}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Append new arrays
new_arrays = {
    'array3': np.random.rand(200, 100).astype(np.float32)
}
npk.append(new_arrays)

# Drop arrays or specific rows
npk.drop('array1')  # Drop entire array
npk.drop('array2', [0, 1, 2])  # Drop specific rows

# Get metadata
shapes = npk.get_shape()  # Get shapes of all arrays
members = npk.get_member_list()  # Get list of array names
mtime = npk.get_modify_time('array1')  # Get modification time

Performance

NumPack offers significant performance improvements compared to traditional NumPy storage methods, especially in data modification operations and random access. Below are detailed benchmark results:

Benchmark Results

The following benchmarks were performed on an MacBook Pro (M1, 2020, 32GB Memory) with arrays of size 1M x 10 and 500K x 5 (float32).

Storage Operations

Operation NumPack NumPy NPZ NumPy NPY
Save 0.014s (0.93x NPZ, 0.57x NPY) 0.013s 0.008s
Full Load 0.008s (1.75x NPZ, 1.00x NPY) 0.014s 0.008s
Selective Load 0.005s (2.00x NPZ, -) 0.010s -
Mmap Load 0.006s (2.17x NPZ, 0.00x NPY) 0.013s 0.000s

Data Modification Operations

Operation NumPack NumPy NPZ NumPy NPY
Single Row Replace 0.000s (23.00x NPZ, 14.00x NPY) 0.023s 0.014s
Continuous Rows (10K) 0.001s (23.00x NPZ, 12.00x NPY) 0.023s 0.012s
Random Rows (10K) 0.015s (1.53x NPZ, 0.87x NPY) 0.023s 0.013s
Large Data Replace (500K) 0.019s (1.16x NPZ, 0.79x NPY) 0.022s 0.015s

Drop Operations

Operation NumPack NumPy NPZ NumPy NPY
Drop Array 0.001s (24.00x NPZ, 1.00x NPY) 0.024s 0.001s
Drop Rows (500K) 0.036s (1.36x NPZ, 0.86x NPY) 0.049s 0.031s

Append Operations

Operation NumPack NumPy NPZ
Append 0.003s (5.33x NPZ) 0.016s

Random Access Performance (10K indices)

Operation NumPack NumPy NPZ NumPy NPY
Random Access 0.008s (1.88x NPZ, 1.13x NPY) 0.015s 0.009s

File Size Comparison

Format Size Ratio
NumPack 47.68 MB 1.0x
NPZ 47.68 MB 1.0x
NPY 47.68 MB 1.0x

Key Performance Highlights

  1. Data Modification:

    • Single row replacement: NumPack is 23x faster than NPZ and 14x faster than NPY
    • Continuous rows: NumPack is 23x faster than NPZ and 12x faster than NPY
    • Random rows: NumPack is 1.53x faster than NPZ but 0.87x slower than NPY
    • Large data replacement: NumPack is 1.16x faster than NPZ but 0.79x slower than NPY
  2. Drop Operations:

    • Drop array: NumPack is 24x faster than NPZ and comparable to NPY
    • Drop rows: NumPack is 1.36x faster than NPZ but 0.86x slower than NPY
    • NumPack provides efficient in-place row deletion without full file rewrite
  3. Loading Performance:

    • Full load: NumPack is 1.75x faster than NPZ and comparable to NPY
    • Memory-mapped load: NumPack is 2.17x faster than NPZ but slower than NPY
    • Selective load: NumPack is 2.00x faster than NPZ
  4. Random Access:

    • NumPack is 1.88x faster than NPZ and 1.13x faster than NPY for random index access
  5. Storage Efficiency:

    • All formats achieve identical compression ratios (47.68 MB)
    • NumPack maintains high performance while keeping file sizes competitive

Note: All benchmarks were performed with float32 arrays. Performance may vary depending on data types, array sizes, and system configurations. Numbers greater than 1.0x indicate faster performance, while numbers less than 1.0x indicate slower performance.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Copyright 2024 NumPack Contributors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

numpack-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (542.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

numpack-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl (559.5 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

numpack-0.1.0-cp312-cp312-win_amd64.whl (405.3 kB view details)

Uploaded CPython 3.12Windows x86-64

numpack-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (614.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

numpack-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (542.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

numpack-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl (559.5 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

numpack-0.1.0-cp311-cp311-win_amd64.whl (404.1 kB view details)

Uploaded CPython 3.11Windows x86-64

numpack-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl (615.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

numpack-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (542.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

numpack-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl (560.5 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

numpack-0.1.0-cp310-cp310-win_amd64.whl (404.1 kB view details)

Uploaded CPython 3.10Windows x86-64

numpack-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl (615.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

numpack-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (542.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

numpack-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl (560.3 kB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file numpack-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d35972fd7733d15dd9ce0d39f99868471adf59d99d749b6a5268bd39af57b829
MD5 1a3581121692d3c07685105adf4e7090
BLAKE2b-256 c3f9ac44010241315442fef483818436f4233c0a885e1e0b83c3d053ce8871f6

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c323a4692661d003cffef0490cbdd59ee1682fabddb58c138c1dd87b402a99c4
MD5 4038f9ac8ddf3294f564093830c4cbc9
BLAKE2b-256 f343091105a32fe9efc5283cc860d07a9da7453ee25b1db80e1883ce6ca79fec

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: numpack-0.1.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 405.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for numpack-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5d90b9263807ac4bc93b1b1fdad34dcc44dc860622eed4559e64a9ecb67f3d56
MD5 f2ccb050043e0b51e924ded5783c72c5
BLAKE2b-256 72ec29a933765f550d09fac74e3db503884fb28c3a54b86a4ff60a86590449a9

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 92fe2331b892be378f3f6b91699ff601a2d1f4c29abc5f3669451f09fbe8334e
MD5 8d0dc39cad51b771dd5d0c7821e1ed1a
BLAKE2b-256 edcdf43a5a73513396539aa61048ea860d104f110d0195bbe297bd5512d02c39

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6b1d83fa853d6813481dcac0206cc6eb65aa4969f55b180f43537e058cf02903
MD5 025bf4ec2c6f53719c4c47b4c79f9d21
BLAKE2b-256 f9b24ccfcd8c10d2cc6146c7fb2cad2c8035c9aa758d5fbb613e1d76d68417ac

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 db5308e78de9d8ad5cd8ff05782aa3ce5877664da2c7cef3b6ac31f7187abc31
MD5 88db401bbfa1f34d2d3164817684a400
BLAKE2b-256 6ed0756b98c9807a82559ef00ca03ea9c4f8b028a7a73043411ba758893c3753

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: numpack-0.1.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 404.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for numpack-0.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 2eb1cd4edbf05f3a2548e3027b5de08ea52cc29e02ddbfd6ca200e6b3354c332
MD5 e8267de4d9bcec7533fe38ff3491c825
BLAKE2b-256 023bde9ba5fa3697d21800bf5700ab1367f7cb7417e8d8b1791eaf11b01a044d

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c086a205b6162369d08d3c95dc9d812f3684a16ba916906881b698de9edf1361
MD5 9596fb739ac61364403288b136d717fb
BLAKE2b-256 e26d61c706b070add069089a897206ee4efc41a678bd1ed85335b5203035e2ce

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 33088aeb493ebe0a8d0866fd6598c3714a0959f73c38ae0e718cf9fb62bf4704
MD5 e4935cac50766e31ced3f51727102104
BLAKE2b-256 b18e46c0397420e5dec287b8188313119c19c890323bd9ad9a227ee4ed234f78

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 bb4fb1e777febf08c591872bed860cea4953ed39cd79384b5d0b9eb15935d56a
MD5 d4106da8f75fc6f0165c0bdd4c385ac9
BLAKE2b-256 0f44da7d4e91cc4e300d25c87c9e54474c349bae88fc8af50c32b0566d7c917b

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: numpack-0.1.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 404.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for numpack-0.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 69f58c0a1105a43edf833d4c9a27bf15e89d82e8148fc8635415efc42950367b
MD5 046929e3de6ab52bf2a0e9745831b751
BLAKE2b-256 b350d344bb0b275d4937c94fab47fc6348ceb32bfa17dc975fad0134bcb404da

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 3de1d3e429546a23ccabdff1e4b1cf6d736385b47500a15159d9fb98d2d9485d
MD5 7ad881c1b3e85052b6ea5c0e089f5081
BLAKE2b-256 8cf9ad40dca9349e916b063710c7a20dc0f614cc0c47cd7333500662e4d7c333

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 27b37bb4d4eee5eae58e260a185bbd279518e46e99f69623eb5adda96e25b53b
MD5 401f34f6bf1e8caeaa8234be8111e643
BLAKE2b-256 a2e837f39c6d2a5aa5a815c7da1af6c5f54463a8dd05f056693f4772d9379e02

See more details on using hashes here.

File details

Details for the file numpack-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f0fc4eac6e383a16fb7ed7b903d12f95ba869746ce41024b0d651f5ab100dd21
MD5 6d69cede970afdfc73c16ba71a6a5b10
BLAKE2b-256 6cefab909a453f35452c36f033acfdeade3b214b659f0887ecc54d87d5b4904a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page