Skip to main content

A high-performance array storage and manipulation library

Project description

NumPack

A high-performance NumPy array storage library combining Rust's speed with Python's simplicity. Optimized for frequent read/write operations on large arrays, with built-in SIMD-accelerated vector similarity search.

Highlights

Feature Performance
Row Replacement 344x faster than NPY
Data Append 338x faster than NPY
Lazy Loading 51x faster than NPY mmap
Full Load 1.64x faster than NPY
Batch Mode 21x speedup
Writable Batch 92x speedup

Core Capabilities:

  • Zero-copy mmap operations with minimal memory footprint
  • SIMD-accelerated Vector Engine (AVX2, AVX-512, NEON, SVE)
  • Batch & Writable Batch modes for high-frequency modifications
  • Supports all NumPy dtypes: bool, int8-64, uint8-64, float16/32/64, complex64/128

Installation

pip install numpack

Requirements: Python ≥ 3.9, NumPy ≥ 1.26.0

Build from Source
# Prerequisites: Rust >= 1.70.0 (rustup.rs), C/C++ compiler
git clone https://github.com/BirchKwok/NumPack.git
cd NumPack
pip install maturin>=1.0,<2.0
maturin develop  # or: maturin build --release

Quick Start

import numpy as np
from numpack import NumPack

with NumPack("data.npk") as npk:
    # Save
    npk.save({'embeddings': np.random.rand(10000, 128).astype(np.float32)})
    
    # Load (normal or lazy)
    data = npk.load("embeddings")
    lazy = npk.load("embeddings", lazy=True)
    
    # Modify
    npk.replace({'embeddings': new_rows}, indices=[0, 1, 2])
    npk.append({'embeddings': more_rows})
    npk.drop('embeddings', [0, 1, 2])  # drop rows
    
    # Random access
    subset = npk.getitem('embeddings', [100, 200, 300])

Batch Modes

# Batch Mode - cached writes (21x speedup)
with npk.batch_mode():
    for i in range(1000):
        arr = npk.load('data')
        arr[:10] *= 2.0
        npk.save({'data': arr})

# Writable Batch Mode - direct mmap (108x speedup)
with npk.writable_batch_mode() as wb:
    arr = wb.load('data')
    arr[:10] *= 2.0  # Auto-persisted

Vector Engine

SIMD-accelerated similarity search (AVX2, AVX-512, NEON, SVE).

from numpack.vector_engine import VectorEngine, StreamingVectorEngine

# In-memory search
engine = VectorEngine()
indices, scores = engine.top_k_search(query, candidates, 'cosine', k=10)

# Multi-query batch (30-50% faster)
all_indices, all_scores = engine.multi_query_top_k(queries, candidates, 'cosine', k=10)

# Streaming from file (for large datasets)
streaming = StreamingVectorEngine()
indices, scores = streaming.streaming_top_k_from_file(
    query, 'vectors.npk', 'embeddings', 'cosine', k=10
)

Supported Metrics: cosine, dot, l2, l2sq, hamming, jaccard, kl, js

Format Conversion

Convert between NumPack and other formats (PyTorch, Arrow, Parquet, SafeTensors).

from numpack.io import from_tensor, to_tensor, from_table, to_table

# Memory <-> .npk (zero-copy when possible)
from_tensor(tensor, 'output.npk', array_name='embeddings')  # tensor -> .npk
tensor = to_tensor('input.npk', array_name='embeddings')     # .npk -> tensor

from_table(table, 'output.npk')  # PyArrow Table -> .npk
table = to_table('input.npk')     # .npk -> PyArrow Table

# File <-> File (streaming for large files)
from numpack.io import from_pt, to_pt
from_pt('model.pt', 'output.npk')  # .pt -> .npk
to_pt('input.npk', 'output.pt')    # .npk -> .pt

Supported formats: PyTorch (.pt), Feather, Parquet, SafeTensors, NumPy (.npy), HDF5, Zarr, CSV

Pack & Unpack

Portable .npkg format for easy migration and sharing.

from numpack import pack, unpack, get_package_info

# Pack NumPack directory into a single .npkg file
pack('data.npk')                          # -> data.npkg (with Zstd compression)
pack('data.npk', 'backup/data.npkg')      # Custom output path

# Unpack .npkg back to NumPack directory
unpack('data.npkg')                       # -> data.npk
unpack('data.npkg', 'restored/')          # Custom restore path

# View package info without extracting
info = get_package_info('data.npkg')
print(f"Files: {info['file_count']}, Compression: {info['compression_ratio']:.1%}")

Benchmarks

Tested on macOS Apple Silicon, 1M rows × 10 columns, Float32 (38.1MB)

Operation NumPack NPY Advantage
Full Load 4.00ms 6.56ms 1.64x
Lazy Load 0.002ms 0.102ms 51x
Replace 100 rows 0.040ms 13.74ms 344x
Append 100 rows 0.054ms 18.26ms 338x
Random Access (100) 0.004ms 0.002ms ~equal
Multi-Format Comparison

Core Operations (1M × 10, Float32, ~38.1MB):

Operation NumPack NPY Zarr HDF5 Parquet Arrow
Save 11.94ms 6.48ms 70.91ms 58.07ms 142.11ms 16.85ms
Full Load 4.00ms 6.56ms 32.86ms 53.99ms 16.49ms 12.39ms
Lazy Load 0.002ms 0.102ms 0.374ms 0.082ms N/A N/A
Replace 100 0.040ms 13.74ms 7.61ms 0.29ms 162.48ms 26.93ms
Append 100 0.054ms 18.26ms 9.05ms 0.39ms 173.45ms 42.46ms

Random Access Performance:

Batch Size NumPack NPY (mmap) Zarr HDF5 Parquet Arrow
100 rows 0.004ms 0.002ms 2.66ms 0.66ms 16.25ms 12.43ms
1K rows 0.025ms 0.021ms 2.86ms 5.02ms 16.48ms 12.61ms
10K rows 0.118ms 0.112ms 16.63ms 505.71ms 17.45ms 12.81ms

Batch Mode Performance (100 consecutive operations):

Mode Time Speedup
Normal 414ms -
Batch Mode 20.1ms 21x
Writable Batch 4.5ms 92x

File Size:

Format Size Compression
NumPack 38.15MB -
NPY 38.15MB -
NPZ 34.25MB
Zarr 34.13MB
HDF5 38.18MB -
Parquet 44.09MB
Arrow 38.16MB -

When to Use NumPack

Use Case Recommendation
Frequent modifications NumPack (344x faster)
ML/DL pipelines NumPack (zero-copy random access, no full load)
Vector similarity search NumPack (SIMD)
Write-once, read-many NumPack (1.64x faster read)
Extreme compression NumPack .npkg (better ratio, streaming, high I/O)
RAG/Embedding storage NumPack (fast retrieval + SIMD search)
Feature store NumPack (real-time updates + low latency)
Memory-constrained environments NumPack (mmap + lazy loading)
Multi-process data sharing NumPack (zero-copy mmap)
Incremental data pipelines NumPack (338x faster append)
Real-time feature updates NumPack (ms-level replace)

Documentation

See docs/ for detailed guides and unified_benchmark.py for benchmark code.

Contributing

Contributions welcome! Please submit a Pull Request.

License

Apache License 2.0 - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numpack-0.5.1.tar.gz (349.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

numpack-0.5.1-cp314-cp314-win_amd64.whl (742.6 kB view details)

Uploaded CPython 3.14Windows x86-64

numpack-0.5.1-cp314-cp314-macosx_11_0_arm64.whl (822.9 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

numpack-0.5.1-cp313-cp313-win_amd64.whl (745.6 kB view details)

Uploaded CPython 3.13Windows x86-64

numpack-0.5.1-cp313-cp313-manylinux_2_38_x86_64.whl (13.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.38+ x86-64

numpack-0.5.1-cp313-cp313-macosx_11_0_arm64.whl (824.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

numpack-0.5.1-cp312-cp312-win_amd64.whl (745.9 kB view details)

Uploaded CPython 3.12Windows x86-64

numpack-0.5.1-cp312-cp312-manylinux_2_38_x86_64.whl (13.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.38+ x86-64

numpack-0.5.1-cp312-cp312-macosx_11_0_arm64.whl (825.3 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

numpack-0.5.1-cp311-cp311-win_amd64.whl (745.7 kB view details)

Uploaded CPython 3.11Windows x86-64

numpack-0.5.1-cp311-cp311-manylinux_2_38_x86_64.whl (13.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.38+ x86-64

numpack-0.5.1-cp311-cp311-macosx_11_0_arm64.whl (825.9 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

numpack-0.5.1-cp310-cp310-win_amd64.whl (745.9 kB view details)

Uploaded CPython 3.10Windows x86-64

numpack-0.5.1-cp310-cp310-manylinux_2_38_x86_64.whl (13.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.38+ x86-64

numpack-0.5.1-cp310-cp310-macosx_11_0_arm64.whl (826.0 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

numpack-0.5.1-cp39-cp39-win_amd64.whl (745.9 kB view details)

Uploaded CPython 3.9Windows x86-64

numpack-0.5.1-cp39-cp39-manylinux_2_38_x86_64.whl (13.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.38+ x86-64

numpack-0.5.1-cp39-cp39-macosx_11_0_arm64.whl (825.8 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file numpack-0.5.1.tar.gz.

File metadata

  • Download URL: numpack-0.5.1.tar.gz
  • Upload date:
  • Size: 349.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for numpack-0.5.1.tar.gz
Algorithm Hash digest
SHA256 019dee006babb1fa21073ceef406d78983e8ab7d91cf4e8790e144f9cf33ae53
MD5 e11221d9a94a66aa2975e0db12906bfa
BLAKE2b-256 5ff28d745d1a2ca24b1fcff49a09a9bfe2c600349983c6bd8f392e338b61c88e

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.1-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 742.6 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for numpack-0.5.1-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 e37fb8843b9cac0e23b54650b460e05f0d0ec7f1a6750abfa6ac5f17fc4334fe
MD5 0c41f627ab171ddea631bcdb2a778f5e
BLAKE2b-256 4218c727a2f4a93382eaeea0500cfedd3f8f05b6744fa24873a89c61cae574aa

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 afae5ac0c5246b982b3a4efda78a7a4dbf538c04af3d6020d0e7e0bcbbaa1aa2
MD5 9160a9187c2e435c68026bf8aa9c399f
BLAKE2b-256 51b8a9c4e63a89410093d8144728e3c7c49e7edc32f5715af90429887d6b2fe5

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 745.6 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for numpack-0.5.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b967745972ae5cb2b418afc4682d8e40da6abf28866841710e98bf8d74044be1
MD5 1532968f71d2f3b3cdb9db55c05de10d
BLAKE2b-256 b41f7eef3bda35edf439690397886ccb29b2ccf456898bd1d1da9a4f85b40df2

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp313-cp313-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp313-cp313-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 11cb003bf6e02fd8ae151e104b1eb1bb4334f6a32f07837f0b56769d7983023a
MD5 a72fe7eb26487816f4298a965a6f558e
BLAKE2b-256 72ff7aeab3dadf03aaae3b897bba4fede45d101faf11f9deb6b4daffd0958809

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fc7f424356ab183e3472beeb3103988e047bb56ce19bcd23c2705dd96dcab725
MD5 5868f91d736fbf1a813a5077a9264731
BLAKE2b-256 fdfa9ed7f96308e1ed0c416d2a2ed26a9fbab325dd8c62ea61139c6530d31d9b

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 745.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for numpack-0.5.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 9464b7b4d7710197a1736e7bc2d99d690160817f4f49c39286ab16a92e9ad349
MD5 8c7091c4d98d32dd565bd822747b08ce
BLAKE2b-256 98aaf5567e4640409e22047fcbb5be13a300ceb9f7025102f3cd4c49e995c3a6

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp312-cp312-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp312-cp312-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 08bc6588f032f28e62ae6d1bad6547ec712df08991c81b86294e63d43f6d3a84
MD5 d329f417cda02e0d38f00465ed9386e1
BLAKE2b-256 e6d4c748de67a134a0c0f8123e45f6da06d4aaa5d07eb488729e700116488f10

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 be84693849a3a6cc500aa004d70eae05eaf928c3979c1a21fbf36c352a7f8052
MD5 9754cc2a9fb63e30af64d3855cb6c5d1
BLAKE2b-256 6c4f8a8865f2b2c4ef6cf6467adf35b2ce075df60f779a18f2143dc21427ef0c

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 745.7 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for numpack-0.5.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 ef3a517fcfede939cc10f9520d3741be2a890c46851dbd0793e8f41263a9dafb
MD5 291a5bc861f3a327899d1343540dabe2
BLAKE2b-256 2fc9fe6f30ae528695b89635547056be91281877237cef23a5d8e5b7cb2a500f

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp311-cp311-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp311-cp311-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 df0881b9617a0a2a7ba4d95da6515e0eefaed8625de0555c98e651ff46ef7cd4
MD5 502029b0ba990129460d02046a19d2a7
BLAKE2b-256 d9dfaa9d42b287b07a48f3684a733ced89518cef1a108e2fac85453947422006

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9079b73a1c0b0905b15470075e0c90559776cd9498d90d15e13d305e51343093
MD5 663af4a255a80c6e58247ca09444e1e6
BLAKE2b-256 f1fabd1c877a2cdc382e4f6e10eadc609e96ad0c9b2bc760624b7009eeaa6dfe

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 745.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for numpack-0.5.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d4aa0a17bdb8c929a40ed9f8f878c633d0bdef2056cedf9173e1608fb5a6b335
MD5 dcab111a714b8f3d8af98c4fec8afa11
BLAKE2b-256 837a1472bcd0e2550742e459628ed8a33a265e2bd2295e574d43bd6ff66cb3dc

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp310-cp310-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp310-cp310-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 f282d6a8bd6ec80ee0bd9c2bfdcd9ccccf1f3a8c1761e2e8afb54622341a0a6b
MD5 6a31a597cfeca80d956f28fce761d1cd
BLAKE2b-256 a0730d79805ae027e3cc87e1278a95ac311c1a6f51f162937e29b45915307716

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 769c616b9905c26d7e970dc2b5d902ab883c85c199dee761b651b93b9618e869
MD5 e5ba968742423b9f0cd931755859e9ca
BLAKE2b-256 2d2795e05cbc588eb707e4cbb3de7c1b01590fe333e68d401f81204240be0432

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 745.9 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for numpack-0.5.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c452e2b77eeeb3ff55aa5375d87ea0447d3d4f0a982a0b1a5b4e7f4a2fcaf463
MD5 b51640326d2cc2f206e13846dec3822b
BLAKE2b-256 53d8fbd507aa6e8e231457ab09fdb206dffc6205c632ebe169381d7e466e2334

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp39-cp39-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp39-cp39-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 5a9c6e760284b776699b3cf70b163660aa0cf14d5b42a5b4bd95ecfaf7445634
MD5 ac73a1f203b83e15d4ba781c13948750
BLAKE2b-256 321f1619efb41c45e152666307554a73e490d29952dfcc14409905aff3f06cc6

See more details on using hashes here.

File details

Details for the file numpack-0.5.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 06b9f015e219762559940d655a3b1d401c32b26b281cdefa9b3d268aa90178e7
MD5 b79b0cd808a00fcbd0f08b6f9034cd4c
BLAKE2b-256 e0e7959757e4df3e92e07aa4277020e0c1f2217e704385f8d39d7bb0897cc23c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page