A high-performance array storage and manipulation library

These details have not been verified by PyPI

Project description

NumPack

NumPack is a high-performance array storage library that combines Rust's performance with Python's ease of use. It provides exceptional performance for both reading and writing large NumPy arrays, with special optimizations for in-place modifications.

Key Features

🚀 397x faster row replacement than NPY
⚡ 405x faster data append than NPY
💨 54x faster lazy loading than NPY mmap
📖 1.3x faster full data loading than NPY
🔄 174x speedup with Writable Batch Mode for frequent modifications
💾 Zero-copy operations with minimal memory footprint
🛠 Seamless integration with existing NumPy workflows

Features

High Performance: Optimized for both reading and writing large numerical arrays
Lazy Loading Support: Efficient memory usage through on-demand data loading
In-place Operations: Support for in-place array modifications without full file rewrite
Batch Processing Modes:
- Batch Mode: 25-37x speedup for batch operations
- Writable Batch Mode: 174x speedup for frequent modifications
Multiple Data Types: Supports various numerical data types including:
- Boolean
- Unsigned integers (8-bit to 64-bit)
- Signed integers (8-bit to 64-bit)
- Floating point (16-bit, 32-bit and 64-bit)
- Complex numbers (64-bit and 128-bit)

Installation

From PyPI (Recommended)

Prerequisites

Python >= 3.9
NumPy >= 1.26.0

pip install numpack

From Source

Prerequisites (All Platforms including Windows)

Python >= 3.9
Rust >= 1.70.0 (Required on all platforms, install from rustup.rs)
NumPy >= 1.26.0
Appropriate C/C++ compiler
- Windows: Microsoft C++ Build Tools
- macOS: Xcode Command Line Tools (xcode-select --install)
- Linux: GCC/Clang (build-essential on Ubuntu/Debian)

Build Steps

Clone the repository:

git clone https://github.com/BirchKwok/NumPack.git
cd NumPack

Install maturin:

pip install maturin>=1.0,<2.0

Build and install:

# Install in development mode
maturin develop

# Or build wheel package
maturin build --release
pip install target/wheels/numpack-*.whl

Usage

Basic Operations

import numpy as np
from numpack import NumPack

# Using context manager (Recommended)
with NumPack("data_directory") as npk:
    # Save arrays
    arrays = {
        'array1': np.random.rand(1000, 100).astype(np.float32),
        'array2': np.random.rand(500, 200).astype(np.float32)
    }
    npk.save(arrays)
    
    # Load arrays - Normal mode
    loaded = npk.load("array1")
    
    # Load arrays - Lazy mode
    lazy_array = npk.load("array1", lazy=True)

Advanced Operations

with NumPack("data_directory") as npk:
    # Replace specific rows
    replacement = np.random.rand(10, 100).astype(np.float32)
    npk.replace({'array1': replacement}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    
    # Append new data
    new_data = {'array1': np.random.rand(100, 100).astype(np.float32)}
    npk.append(new_data)
    
    # Drop arrays or specific rows
    npk.drop('array1')  # Drop entire array
    npk.drop('array2', [0, 1, 2])  # Drop specific rows
    
    # Random access operations
    data = npk.getitem('array1', [0, 1, 2])
    data = npk['array1']  # Dictionary-style access
    
    # Stream loading for large arrays
    for batch in npk.stream_load('array1', buffer_size=1000):
        process_batch(batch)

Batch Processing Modes

NumPack provides two high-performance batch modes for scenarios with frequent modifications:

Batch Mode (25-37x speedup)

with NumPack("data.npk") as npk:
    with npk.batch_mode():
        for i in range(1000):
            arr = npk.load('data')      # Load from cache
            arr[:10] *= 2.0
            npk.save({'data': arr})     # Save to cache
# All changes written to disk on exit

Writable Batch Mode (174x speedup)

with NumPack("data.npk") as npk:
    with npk.writable_batch_mode() as wb:
        for i in range(1000):
            arr = wb.load('data')   # Memory-mapped view
            arr[:10] *= 2.0         # Direct modification
            # No save needed - changes are automatic

Performance

All benchmarks were conducted on macOS (Apple Silicon) using the Rust backend with precise timeit measurements.

Performance Comparison (1M rows × 10 columns, Float32, 38.1MB)

Operation	NumPack	NPY	NPZ	Zarr	HDF5	Parquet	NumPack Advantage
Full Load	8.27ms 🥇	10.51ms	181.62ms	41.40ms	58.39ms	23.74ms	1.3x vs NPY
Lazy Load	0.002ms 🥇	0.107ms	N/A	0.397ms	0.080ms	N/A	54x vs NPY
Replace 100 rows	0.047ms 🥇	18.51ms	1574ms	9.08ms	0.299ms	187.65ms	397x vs NPY
Append 100 rows	0.067ms 🥇	27.09ms	1582ms	9.98ms	0.212ms	204.74ms	405x vs NPY
Random Access (1K)	0.051ms	0.010ms 🥇	183.16ms	3.46ms	4.91ms	22.80ms	26x vs NPZ
Save	16.15ms	7.19ms 🥇	1378ms	80.91ms	55.66ms	159.14ms	2.2x slower

Performance Comparison (100K rows × 10 columns, Float32, 3.8MB)

Operation	NumPack	NPY	NPZ	Zarr	HDF5	NumPack Advantage
Full Load	0.98ms	0.66ms 🥇	18.65ms	6.24ms	6.35ms	1.5x slower
Lazy Load	0.002ms 🥇	0.103ms	N/A	0.444ms	0.085ms	51x vs NPY
Replace 100 rows	0.039ms 🥇	2.13ms	159.19ms	4.39ms	0.208ms	55x vs NPY
Append 100 rows	0.059ms 🥇	3.29ms	159.19ms	4.59ms	0.206ms	56x vs NPY
Random Access (1K)	0.116ms	0.010ms 🥇	18.73ms	1.89ms	4.82ms	12x vs NPZ

Batch Mode Performance (1M rows × 10 columns)

100 consecutive modify operations:

Mode	Time	Speedup
Normal Mode	856ms	1.0x
Batch Mode	34ms	25x faster 🔥
Writable Batch Mode	4.9ms	174x faster 🔥🔥

Key Performance Highlights

Data Modification - Exceptional Performance 🏆
- Replace operations: 397x faster than NPY (large dataset)
- Append operations: 405x faster than NPY (large dataset)
- Supports efficient in-place modification without full file rewrite
- NumPack's core advantage
Data Loading - Industry Leading
- Full load: Fastest for large datasets (8.27ms)
- Lazy load: 54x faster than NPY mmap (0.002ms)
- Optimized batch data transfer with SIMD acceleration
Batch Processing - Revolutionary Performance
- Batch Mode: 25-37x speedup for batch operations
- Writable Batch Mode: 174x speedup for frequent modifications
- Ideal for machine learning pipelines and data processing workflows
Storage Efficiency
- File size identical to NPY
- ~10% smaller than Zarr/NPZ (compressed formats)

When to Use NumPack

✅ Strongly Recommended (90% of use cases):

Machine learning and deep learning pipelines
Real-time data stream processing
Data annotation and correction workflows
Feature stores with dynamic updates
Any scenario requiring frequent data modifications
Fast data loading requirements

⚠️ Consider Alternatives (10% of use cases):

Write-once, never modify → Use NPY (faster initial write)
Frequent single-row access → Use NPY mmap
Extreme compression requirements → Use NPZ (10% smaller, but 1000x slower)

Best Practices

1. Use Writable Batch Mode for Frequent Modifications

# 174x speedup for frequent modifications
with NumPack("data.npk") as npk:
    with npk.writable_batch_mode() as wb:
        for i in range(1000):
            arr = wb.load('data')
            arr[:10] *= 2.0
# Automatic persistence on exit

2. Use Batch Mode for Batch Operations

# 25-37x speedup for batch processing
with NumPack("data.npk") as npk:
    with npk.batch_mode():
        for i in range(1000):
            arr = npk.load('data')
            arr[:10] *= 2.0
            npk.save({'data': arr})
# Single write on exit

3. Use Lazy Loading for Large Datasets

with NumPack("large_data.npk") as npk:
    # Only 0.002ms to initialize
    lazy_array = npk.load("array", lazy=True)
    # Data loaded on demand
    subset = lazy_array[1000:2000]

4. Reuse NumPack Instances

# ✅ Efficient: Reuse instance
with NumPack("data.npk") as npk:
    for i in range(100):
        data = npk.load('array')

# ❌ Inefficient: Create new instance each time
for i in range(100):
    with NumPack("data.npk") as npk:
        data = npk.load('array')

Benchmark Methodology

All benchmarks use:

timeit for precise timing
Multiple repeats, best time selected
Pure operation time (excluding file open/close overhead)
Float32 arrays
macOS Apple Silicon (results may vary by platform)

For complete benchmark code, see comprehensive_format_benchmark.py.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.0

Mar 20, 2026

0.5.2

Mar 19, 2026

0.5.1

Feb 19, 2026

0.5.0

Dec 21, 2025

0.4.5

Dec 19, 2025

0.4.4

Nov 4, 2025

0.4.3

Nov 3, 2025

0.4.2

Oct 31, 2025

0.4.1

Oct 23, 2025

This version

0.4.0

Oct 22, 2025

0.3.0

Jul 28, 2025

0.2.1

Jul 22, 2025

0.2.0

Jul 20, 2025

0.1.8

Jul 12, 2025

0.1.7

Jul 11, 2025

0.1.6

Jan 20, 2025

0.1.5

Jan 16, 2025

0.1.4

Jan 16, 2025

0.1.3

Jan 15, 2025

0.1.2

Jan 14, 2025

0.1.1

Jan 13, 2025

0.1.0

Jan 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

numpack-0.4.0-cp310-cp310-manylinux_2_34_x86_64.whl (616.3 kB view details)

Uploaded Oct 22, 2025 CPython 3.10manylinux: glibc 2.34+ x86-64

numpack-0.4.0-cp310-cp310-macosx_11_0_arm64.whl (530.5 kB view details)

Uploaded Oct 22, 2025 CPython 3.10macOS 11.0+ ARM64

numpack-0.4.0-cp310-cp310-macosx_10_12_x86_64.whl (587.7 kB view details)

Uploaded Oct 22, 2025 CPython 3.10macOS 10.12+ x86-64

File details

Details for the file numpack-0.4.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

Download URL: numpack-0.4.0-cp310-cp310-manylinux_2_34_x86_64.whl
Upload date: Oct 22, 2025
Size: 616.3 kB
Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for numpack-0.4.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`027c84fb054bc7d95dd8b6c0c50be0ec549e389e153c401f166f6b750d49d889`
MD5	`0f2d765f67e5d9338025a6a7c35682e4`
BLAKE2b-256	`b21d0c54074e6cf979c7616e96cefbafa79a1dd6ef56aa3a4590155b81220a10`

See more details on using hashes here.

File details

Details for the file numpack-0.4.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

Download URL: numpack-0.4.0-cp310-cp310-macosx_11_0_arm64.whl
Upload date: Oct 22, 2025
Size: 530.5 kB
Tags: CPython 3.10, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for numpack-0.4.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`9396afba7e33d9e23835f48f47c6f196d66ae9db6409cc53932968c59ccd559f`
MD5	`84ef953b4b9e77cbf4e7e18d618dd060`
BLAKE2b-256	`3d27157235bd29c2701a9e0f80925523847b46aaed0478a0e3951edc3ae3342a`

See more details on using hashes here.

File details

Details for the file numpack-0.4.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

Download URL: numpack-0.4.0-cp310-cp310-macosx_10_12_x86_64.whl
Upload date: Oct 22, 2025
Size: 587.7 kB
Tags: CPython 3.10, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for numpack-0.4.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`05638976ceb089ada092d309385468777d6d7be9567949d6bd0c1f40bfd2c6ea`
MD5	`9a087956dc84750e8a089f8835aa11c1`
BLAKE2b-256	`f5f909d847a71b56e32a09de794fafbc0129fcb3d1ef575586fdc8d55f3a57b7`

See more details on using hashes here.

numpack 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

NumPack

Key Features

Features

Installation

From PyPI (Recommended)

Prerequisites

From Source

Prerequisites (All Platforms including Windows)

Build Steps

Usage

Basic Operations

Advanced Operations

Batch Processing Modes

Batch Mode (25-37x speedup)

Writable Batch Mode (174x speedup)

Performance

Performance Comparison (1M rows × 10 columns, Float32, 38.1MB)

Performance Comparison (100K rows × 10 columns, Float32, 3.8MB)

Batch Mode Performance (1M rows × 10 columns)

Key Performance Highlights

When to Use NumPack

Best Practices

1. Use Writable Batch Mode for Frequent Modifications

2. Use Batch Mode for Batch Operations

3. Use Lazy Loading for Large Datasets

4. Reuse NumPack Instances

Benchmark Methodology

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes