Skip to main content

Python bindings for genegraph-storage. Store your numpy data in Lance format

Project description

pygenestore

Store your numpy arrays at scale using the Lance format. Handles millions of rows as far as the memory goes.

Usage

It is possible to create multiple storages by passing different directories to create_storage.

It is possible to store different arrays in the same storage, just set different names.

Default API (blocking)

import numpy as np
import genestore

# Configure storage
builder = genestore.store_array("./lance_data")
builder.with_max_rows_per_file(500_000)
builder.with_compression("zstd")

# Build storage instance
storage = builder.build()

# Create data (2D float64 numpy array)
np.random.seed(42)
x = np.random.randn(1000, 128).astype(np.float64)

# Store (blocking)
path = storage.store(x, "my_dataset")
print("Stored at:", path)

# Load (blocking)
y = storage.load("my_dataset")
print("Loaded shape:", y.shape)

# Verify roundtrip
assert np.allclose(x, y)
assert np.array_equal(x, y)
print("✓ Data verification passed")

Async API

import numpy as np
import genestore
import asyncio

async def main():
    # Create a storage builder and configure it
    builder = genestore.create_storage(f"./lance_data")
    builder.with_max_rows_per_file(500000)
    builder.with_compression("zstd")

    # Build the storage instance
    storage = builder.build()

    # Create a numpy array (dense matrix)
    np.random.seed(42)  # For reproducibility
    data = np.random.randn(1000, 128).astype(np.float64)

    # Store the array (await the async call)
    path = await storage.aio.store(data, "my_dataset")
    print(f"Storage at: {path}")

    # Load the array back using the NAME (not path)
    loaded_data = await storage.aio.load("my_dataset")
    print(f"Loaded shape: {loaded_data.shape}")

    # Verify the data
    assert np.allclose(data, loaded_data)
    print("✓ Data verification passed!")

if __name__ == "__main__":
    asyncio.run(main())

Tests

pip install -r requirements-dev.txt
pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genestore-0.11.0.tar.gz (42.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genestore-0.11.0-cp312-cp312-manylinux_2_35_x86_64.whl (44.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

File details

Details for the file genestore-0.11.0.tar.gz.

File metadata

  • Download URL: genestore-0.11.0.tar.gz
  • Upload date:
  • Size: 42.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for genestore-0.11.0.tar.gz
Algorithm Hash digest
SHA256 106e32e3bf15e4f9051464a984259e0996d11c4aebb1b3d268f8872365e68ffd
MD5 00fc126e5b386b0ba38c9446619eac72
BLAKE2b-256 533515b5e6c957662b409737d92f178ce398167cc3f013e04fb9f9146395f1ba

See more details on using hashes here.

File details

Details for the file genestore-0.11.0-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for genestore-0.11.0-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 490985cb79fc83d4ace4871a6a9d51df3533b850e1376c927bb8000aa651d518
MD5 f86b5475b3a934092201f3fd335aed3e
BLAKE2b-256 fafce274125a8ff68323982596779025b5b734124fe238c0d4cbfcf779b7cd8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page