Skip to main content

Memory-mapped columnar binary format for fast random-access I/O on structured arrays.

Project description

ColStore

A memory-mapped columnar binary format for fast, memory-efficient I/O on structured arrays. colstore lets you write a tabular dataset to a single .cstore file once and then load arbitrary row/column subsets without materializing the rest. Internally, columns are stored back-to-back as raw NumPy bytes, reads use np.memmap, and fancy-index gathers run through a parallel C++ kernel (OpenMP + software prefetching) bound via Cython. Process memory stays bounded by the size of the output you ask for; the source file is never fully read into RAM.

Install

pip install colstore

Building from source needs a C++17 compiler and CMake ≥ 3.18. On macOS install libomp (brew install libomp) to get the parallel kernel; without it the build still succeeds but the kernel runs single-threaded.

Quick start

from colstore import ColStore

# Write and open in one call. `.cstore` is the canonical extension.
ds = ColStore.from_dataframe(df, "data.cstore")

# Indexing returns lazy views; no data is read yet.
ds['price']                          # ColumnView
ds[100:200]                          # TableView
ds[100:200, 'price']                 # ColumnView
ds[100:200, ['price', 'qty']]        # TableView
ds[[1, 5, 9], ['price', 'qty']]      # TableView (fancy rows + cols)

# Materialize through one of the to_* methods.
ds['price'].to_array()                          # 1D ndarray
ds[indices, ['price', 'qty']].to_dict()         # dict of 1D arrays
ds[indices, ['price', 'qty']].to_record()       # structured ndarray
ds[indices, ['price', 'qty']].to_dataframe()    # pandas DataFrame

Writing from other sources

from colstore import ColStore
import numpy as np

# From a dict of 1D arrays.
ColStore.from_dict(
    {"x": np.arange(100, dtype=np.float32), "y": np.arange(100, dtype=np.int64)},
    "data.cstore",
)

# From a structured (record) array.
records = np.empty(100, dtype=[("price", np.float32), ("qty", np.int32)])
ColStore.from_records(records, "data.cstore")

Each factory returns an opened ColStore ready to read from.

Configuration

from colstore import set_max_workers, set_default_madvise, set_default_backend

set_max_workers(8)                # parallel gathers across columns
set_default_madvise("sequential") # OS read-ahead hint for sorted-index reads
set_default_backend("cpp")        # gather kernel: cpp | numpy | numba

On-disk format

[magic 8B = b"CSTORE\x00\x01"]
[manifest_len 8B (u64 little-endian)]
[manifest_json]
[zero-padding to 64-byte alignment]
[column_0 raw bytes][column_1 raw bytes]...[column_n raw bytes]

The manifest is a small JSON object recording format_version, n_rows, and per-column {name, dtype}. Column dtypes are preserved byte-for-byte; columns are stored back-to-back with no per-row overhead.

Supported dtypes

Fixed-size only: float32, float64, int8/16/32/64, uint8/16/32/64, bool. Object dtype (strings, Python objects) is rejected at write time — the design point is zero-copy random access, which requires a fixed stride.

Layout

colstore/
├── pyproject.toml              # scikit-build-core build
├── CMakeLists.txt              # Cython + C++ build
├── include/colstore/
│   └── gather.hpp              # public C++ header
├── src/
│   ├── cpp/gather.cpp          # OpenMP + prefetch kernel
│   ├── cython/_gather.pyx      # dtype-dispatched binding
│   └── colstore/               # Python package
│       ├── __init__.py
│       ├── config.py
│       ├── format.py
│       ├── kernels.py
│       ├── view.py             # ColumnView + TableView
│       └── store.py
└── tests/                      # pytest suite

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colstore-0.1.0.tar.gz (36.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

colstore-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (122.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

colstore-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (42.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

colstore-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

colstore-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (42.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

colstore-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (122.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

colstore-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (42.6 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

colstore-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (122.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

colstore-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (42.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file colstore-0.1.0.tar.gz.

File metadata

  • Download URL: colstore-0.1.0.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for colstore-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3f5c1ad5c852e699d72ac2a419dc334b4d772b401b415e4a355be4bb6bd51853
MD5 98df2dbfc1ab35747f8efe6817181072
BLAKE2b-256 d3e373b82580e556ed16424d9eef0eb82975287f6b04d628f559093fe1dd7f9f

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0.tar.gz:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colstore-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for colstore-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0079e3b40a941b22a1ddc19fcb90213ca0d8937cfe29b9614c24222b9a79f037
MD5 539bfe9f8ae5baee668469f809923a7d
BLAKE2b-256 5fcd2759d85c713db3e14542ea6bd67f5eddec0b35674b8613852c6ff2c438ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colstore-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for colstore-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 636f8c3b40c056367e8629fbf7d3c071df5d47f520c9e0e210d8d3bddee97e57
MD5 d93f5780acf35161cebd52a83acdfba2
BLAKE2b-256 5db7945d61059526945f2703ec4c22efb603075079305a13c6be35525a6c65cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colstore-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for colstore-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c2e3e18e42316229d987ac717d6b7d647d004ba86d9929bb509a56f0ac2ac7eb
MD5 01043d30c2597eef3ed18f9b3a10377f
BLAKE2b-256 99f46933ca0bfbd66bcbeadeed79f14746fc1a028b62c626f615f459a4f500ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colstore-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for colstore-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 17b3981d1fe39f79e33f9d44522a0f2158def90d83ba8cce478c444506f2685e
MD5 6cf3190c5ed1578ab87415d3cc00cbd0
BLAKE2b-256 1cc81579c2d074a64a5373c1d3374822c783da094a76d06df9a80b79f5b368b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colstore-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for colstore-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 43021d0a09f745a807bc6d15fdf372c2f06199cdf21f519dcc869d282dba4902
MD5 d19863b9bbbbb55e346ccf676893ab56
BLAKE2b-256 a0ded44348577f17a4a6f781878b4a77de7bf8c3728640ae4c1ad644f2f57002

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colstore-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for colstore-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 673c1209ef0f6afbdbf598d229c579d98dc99d20f9a62b6998b63427f7776364
MD5 8c61961837fc81da5b422555b20f74d9
BLAKE2b-256 4bbc96873f5c444b9d23d6ff6360c0814f5939bc137cbb44e176a78a1e6188a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colstore-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for colstore-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6bee1f715e5d501ae41acf5090d3aa982a185dd7d3827a9a3658fd3a5c253e4d
MD5 60516a0d9bce621f6d76b88d061f5a6f
BLAKE2b-256 1b9bfcd3ddae86f2d1a85678386bd63de128f7f6c67d642131debe713ecb9c7e

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file colstore-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for colstore-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4ee8e44bd47f4a18e7b44da59d567bb91d370ad231f0931ab12137c49817649d
MD5 7e5c0361bc1b97d6744f6d881492fac0
BLAKE2b-256 d52fcb153b407a004e3cc49ec4e47801153ac8cbef6563423a73879158263ca6

See more details on using hashes here.

Provenance

The following attestation bundles were made for colstore-0.1.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release.yml on AlkaidCheng/colstore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page