Skip to main content

A high-performance array storage and manipulation library

Project description

NumPack

A high-performance NumPy array storage library combining Rust's speed with Python's simplicity. Optimized for frequent read/write operations on large arrays, with built-in SIMD-accelerated vector similarity search.

Highlights

Feature Performance
Row Replacement 344x faster than NPY
Data Append 338x faster than NPY
Lazy Loading 51x faster than NPY mmap
Full Load 1.64x faster than NPY
Batch Mode 21x speedup
Writable Batch 92x speedup

Core Capabilities:

  • Zero-copy mmap operations with minimal memory footprint
  • SIMD-accelerated Vector Engine (AVX2, AVX-512, NEON, SVE)
  • Batch & Writable Batch modes for high-frequency modifications
  • Supports all NumPy dtypes: bool, int8-64, uint8-64, float16/32/64, complex64/128

Installation

pip install numpack

Requirements: Python ≥ 3.9, NumPy ≥ 1.26.0

Build from Source
# Prerequisites: Rust >= 1.70.0 (rustup.rs), C/C++ compiler
git clone https://github.com/BirchKwok/NumPack.git
cd NumPack
pip install maturin>=1.0,<2.0
maturin develop  # or: maturin build --release

Quick Start

import numpy as np
from numpack import NumPack

with NumPack("data.npk") as npk:
    # Save
    npk.save({'embeddings': np.random.rand(10000, 128).astype(np.float32)})
    
    # Load (normal or lazy)
    data = npk.load("embeddings")
    lazy = npk.load("embeddings", lazy=True)
    
    # Modify
    npk.replace({'embeddings': new_rows}, indices=[0, 1, 2])
    npk.append({'embeddings': more_rows})
    npk.drop('embeddings', [0, 1, 2])  # drop rows
    
    # Random access
    subset = npk.getitem('embeddings', [100, 200, 300])

Batch Modes

# Batch Mode - cached writes (21x speedup)
with npk.batch_mode():
    for i in range(1000):
        arr = npk.load('data')
        arr[:10] *= 2.0
        npk.save({'data': arr})

# Writable Batch Mode - direct mmap (108x speedup)
with npk.writable_batch_mode() as wb:
    arr = wb.load('data')
    arr[:10] *= 2.0  # Auto-persisted

Vector Engine

SIMD-accelerated similarity search (AVX2, AVX-512, NEON, SVE).

from numpack.vector_engine import VectorEngine, StreamingVectorEngine

# In-memory search
engine = VectorEngine()
indices, scores = engine.top_k_search(query, candidates, 'cosine', k=10)

# Multi-query batch (30-50% faster)
all_indices, all_scores = engine.multi_query_top_k(queries, candidates, 'cosine', k=10)

# Streaming from file (for large datasets)
streaming = StreamingVectorEngine()
indices, scores = streaming.streaming_top_k_from_file(
    query, 'vectors.npk', 'embeddings', 'cosine', k=10
)

Supported Metrics: cosine, dot, l2, l2sq, hamming, jaccard, kl, js

Format Conversion

Convert between NumPack and other formats (PyTorch, Arrow, Parquet, SafeTensors).

from numpack.io import from_torch, to_torch, from_arrow, to_arrow

# Memory ↔ .npk (zero-copy when possible)
from_torch(tensor, 'output.npk', array_name='embeddings')  # tensor → .npk
tensor = to_torch('input.npk', array_name='embeddings')     # .npk → tensor

from_arrow(table, 'output.npk')  # PyArrow Table → .npk
table = to_arrow('input.npk')     # .npk → PyArrow Table

# File ↔ File (streaming for large files)
from numpack.io import from_torch_file, to_torch_file
from_torch_file('model.pt', 'output.npk')  # .pt → .npk
to_torch_file('input.npk', 'output.pt')    # .npk → .pt

Supported formats: PyTorch (.pt), Feather, Parquet, SafeTensors, NumPy (.npy), HDF5, Zarr, CSV

Pack & Unpack

Portable .npkg format for easy migration and sharing.

from numpack import pack, unpack, get_package_info

# Pack NumPack directory into a single .npkg file
pack('data.npk')                          # -> data.npkg (with Zstd compression)
pack('data.npk', 'backup/data.npkg')      # Custom output path

# Unpack .npkg back to NumPack directory
unpack('data.npkg')                       # -> data.npk
unpack('data.npkg', 'restored/')          # Custom restore path

# View package info without extracting
info = get_package_info('data.npkg')
print(f"Files: {info['file_count']}, Compression: {info['compression_ratio']:.1%}")

Benchmarks

Tested on macOS Apple Silicon, 1M rows × 10 columns, Float32 (38.1MB)

Operation NumPack NPY Advantage
Full Load 4.00ms 6.56ms 1.64x
Lazy Load 0.002ms 0.102ms 51x
Replace 100 rows 0.040ms 13.74ms 344x
Append 100 rows 0.054ms 18.26ms 338x
Random Access (100) 0.004ms 0.002ms ~equal
Multi-Format Comparison

Core Operations (1M × 10, Float32, ~38.1MB):

Operation NumPack NPY Zarr HDF5 Parquet Arrow
Save 11.94ms 6.48ms 70.91ms 58.07ms 142.11ms 16.85ms
Full Load 4.00ms 6.56ms 32.86ms 53.99ms 16.49ms 12.39ms
Lazy Load 0.002ms 0.102ms 0.374ms 0.082ms N/A N/A
Replace 100 0.040ms 13.74ms 7.61ms 0.29ms 162.48ms 26.93ms
Append 100 0.054ms 18.26ms 9.05ms 0.39ms 173.45ms 42.46ms

Random Access Performance:

Batch Size NumPack NPY (mmap) Zarr HDF5 Parquet Arrow
100 rows 0.004ms 0.002ms 2.66ms 0.66ms 16.25ms 12.43ms
1K rows 0.025ms 0.021ms 2.86ms 5.02ms 16.48ms 12.61ms
10K rows 0.118ms 0.112ms 16.63ms 505.71ms 17.45ms 12.81ms

Batch Mode Performance (100 consecutive operations):

Mode Time Speedup
Normal 414ms -
Batch Mode 20.1ms 21x
Writable Batch 4.5ms 92x

File Size:

Format Size Compression
NumPack 38.15MB -
NPY 38.15MB -
NPZ 34.25MB
Zarr 34.13MB
HDF5 38.18MB -
Parquet 44.09MB
Arrow 38.16MB -

When to Use NumPack

Use Case Recommendation
Frequent modifications NumPack (344x faster)
ML/DL pipelines NumPack (zero-copy random access, no full load)
Vector similarity search NumPack (SIMD)
Write-once, read-many NumPack (1.64x faster read)
Extreme compression NumPack .npkg (better ratio, streaming, high I/O)
RAG/Embedding storage NumPack (fast retrieval + SIMD search)
Feature store NumPack (real-time updates + low latency)
Memory-constrained environments NumPack (mmap + lazy loading)
Multi-process data sharing NumPack (zero-copy mmap)
Incremental data pipelines NumPack (338x faster append)
Real-time feature updates NumPack (ms-level replace)

Documentation

See docs/ for detailed guides and unified_benchmark.py for benchmark code.

Contributing

Contributions welcome! Please submit a Pull Request.

License

Apache License 2.0 - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numpack-0.5.0.tar.gz (451.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

numpack-0.5.0-cp314-cp314-win_amd64.whl (721.2 kB view details)

Uploaded CPython 3.14Windows x86-64

numpack-0.5.0-cp314-cp314-macosx_11_0_arm64.whl (806.1 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

numpack-0.5.0-cp313-cp313-win_amd64.whl (718.9 kB view details)

Uploaded CPython 3.13Windows x86-64

numpack-0.5.0-cp313-cp313-manylinux_2_34_x86_64.whl (13.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

numpack-0.5.0-cp313-cp313-macosx_11_0_arm64.whl (804.8 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

numpack-0.5.0-cp312-cp312-win_amd64.whl (719.2 kB view details)

Uploaded CPython 3.12Windows x86-64

numpack-0.5.0-cp312-cp312-manylinux_2_34_x86_64.whl (13.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

numpack-0.5.0-cp312-cp312-macosx_11_0_arm64.whl (805.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

numpack-0.5.0-cp311-cp311-win_amd64.whl (719.4 kB view details)

Uploaded CPython 3.11Windows x86-64

numpack-0.5.0-cp311-cp311-manylinux_2_34_x86_64.whl (13.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

numpack-0.5.0-cp311-cp311-macosx_11_0_arm64.whl (805.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

numpack-0.5.0-cp310-cp310-win_amd64.whl (719.7 kB view details)

Uploaded CPython 3.10Windows x86-64

numpack-0.5.0-cp310-cp310-manylinux_2_34_x86_64.whl (13.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

numpack-0.5.0-cp310-cp310-macosx_11_0_arm64.whl (805.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

numpack-0.5.0-cp39-cp39-win_amd64.whl (719.6 kB view details)

Uploaded CPython 3.9Windows x86-64

numpack-0.5.0-cp39-cp39-manylinux_2_34_x86_64.whl (13.2 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

numpack-0.5.0-cp39-cp39-macosx_11_0_arm64.whl (805.8 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file numpack-0.5.0.tar.gz.

File metadata

  • Download URL: numpack-0.5.0.tar.gz
  • Upload date:
  • Size: 451.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for numpack-0.5.0.tar.gz
Algorithm Hash digest
SHA256 b8df7601e0c6ebf0552db1db0178d1faee2295b77be02bbd31412c1985c09142
MD5 00a6a5ab37aba0de007627415c470a70
BLAKE2b-256 95692ff580b3f3527a354a42fc0689fc68e70e7ed254e8d8b0319681e1c410e4

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 721.2 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for numpack-0.5.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 df44b77427fc21e8fd0d65a3c34587de5e59f856c71c3407e9c2969c4ff76349
MD5 c329049ce2d70be9cf99157196df2797
BLAKE2b-256 f7a7322b58adcb90dce040ac29224427965984a316291da71b44bfcafb265a8e

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7a0691a2773ed5946dde87f0c5699f8f08a6a4f1c69ba3d1e0a5e84628eeac53
MD5 82def288591952e95a8624115abff48f
BLAKE2b-256 34e8df24170517181351f90c6b49d5955fce862d72029fe2e101ebceec19c608

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 718.9 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for numpack-0.5.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 621b7afa49c5dc1bedd8d755b6ccbd84e0829741363309921896c14112c72c6a
MD5 ead850f7dae7a87d519a11d2a16c7444
BLAKE2b-256 cd212c0fcaff824960d233e6d91ffe81f17f39e44110ad8bbd118bee44d537c4

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d613e6da708c4fdfed8ddca8bcf18263c24f2a4b91981f9ccb7a6a611365d91d
MD5 7c7b4d08709b9530acacfe8d43689869
BLAKE2b-256 467a7a8f04bc1291ba694443fddd8a6e96f4f077946db16431c9df48d6828fc3

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 350149a7fb0ab6e004a6eee3c302018a9d7ec5ca36d8e31e201579f7b0a05606
MD5 789cacd2d4eea978b5b80535fc46111c
BLAKE2b-256 7de3e3c01f80bfc54739c05eb940690024bab642834f0800865b6161447e2468

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 719.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for numpack-0.5.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a038186dc304396f4e71274f1c73b9ed7c17fe0b96d8f8684de0d5c6d28e0f83
MD5 55ef142a091067d3c133cb84fada3776
BLAKE2b-256 46429200d64ebde2be7db473ea273ebec4c06151eb2eb593cce063b6d4304624

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d5b024bbcdb630e078c20fc6498f0c236ad6731185b8839eecb3accb832495a5
MD5 d8c299cb3bb64342fbff25bc6499b556
BLAKE2b-256 a546534b6da5dc0d68138d4e6e33911bd82e3c9c6cbf258f59b7469f418e3312

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d5a3e4015deb86be7998f04cece65f4b955ee588395acfacfc826d58af2e112f
MD5 5d4893697ea8f9966828a756968582fe
BLAKE2b-256 e5b8094b8404afdae3a26d470058a3028bdccec18ba632f6f8fda322fe620940

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 719.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for numpack-0.5.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b8cc4136ab976226c6ad5c3add5c080fd17c8699ccabb5e52eb9513e584de176
MD5 dddee94804ef80b9da273e3f71a82762
BLAKE2b-256 80594ca34b009e3ee0dd38755f42b16c736148fa856a8327fa86452c21216820

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 86b18758554b816e8483f8652eb2d313c14d87db70c0f4ef5527b0228fffcc24
MD5 fe33883d38a43b2b1aa9148fea19b125
BLAKE2b-256 ba5d9b991f7f359a427febc5ab080171ee38b95d07b383f927580cf1ad55ecf1

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 40b74980efb84fa65b7573d061230dfad202343b6ff03f0c9ced16293e479062
MD5 e80a11d662decdaedf045327172bad6e
BLAKE2b-256 b138884782a487e42f6192ee41659490e58473207ef23d2fcb038194b7bc3dac

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 719.7 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for numpack-0.5.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 de1be143d3395221c93ec01c9e7151337a63c414f751fa19509418c0e67f4d19
MD5 470eba6ce80feb1a31b0602b73188b1e
BLAKE2b-256 3b50959f543aa85512fe8cba13e8085f7e71195c27780d947b20107e688daefb

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9579171917d586e5de8a2ab3e3788588d593ce2637ce8c9074e0d16fe1592d54
MD5 849fb050e402c9ffeb3bf7b0b10aad09
BLAKE2b-256 5d23e3652912bf2fcfee926eff6a71cf3ef8ede10839a33ae4f82f4f05a2150d

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2cf1f871358667cf0e7747ac1020f466df408396ef12748bd95e12d6e6f562b6
MD5 bad374543eb70a23a2a39f52e82ff1f6
BLAKE2b-256 0303f04f53ec7699db7ed808fda2b392e23fd57499b2d013f80c23a0f350bf68

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: numpack-0.5.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 719.6 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for numpack-0.5.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 ac0a651cb77161a4d3409cf4145a65a544ee7974f05d5f6dcfeccdd14c692606
MD5 80399716538d873e98a8e0e49c469214
BLAKE2b-256 ea4dac9b8c60cc3e53ac26ba04a6545f70ed13bfc7f47d091a3291593f3154ab

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ade8f74622c872f8742ded215d7f406b6a67b6cb21d6246369ee7c0c54428435
MD5 3e7e3e6b2bbe6c82a895e96fa6988c99
BLAKE2b-256 3760e0d7e26b3943a9d69e5f96bb00879bbb25f504e24e5e0a0a84f0bcd4db52

See more details on using hashes here.

File details

Details for the file numpack-0.5.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for numpack-0.5.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b46e3685436c45c7d7df6c821e89d30e7b270ed3f854bda69fe78ab4c78e9d62
MD5 75cf6973f2f2ca583a821de6c462d7e2
BLAKE2b-256 ca9fa5b1290f2fd2125c462fd82aa20729c834a9c17368f1a390b3d519d9679b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page