Skip to main content

Python bindings for genegraph-storage. Store your numpy data in Lance format

Project description

pygenestore

Store your numpy arrays at scale using the Lance format. Handles millions of rows as far as the memory goes.

Usage

It is possible to create multiple storages by passing different directories to create_storage.

It is possible to store different arrays in the same storage, just set different names.

Default API (blocking)

import numpy as np
import genestore

# Configure storage
builder = genestore.store_array("./lance_data")
builder.with_max_rows_per_file(500_000)
builder.with_compression("zstd")

# Build storage instance
storage = builder.build()

# Create data (2D float64 numpy array)
np.random.seed(42)
x = np.random.randn(1000, 128).astype(np.float64)

# Store (blocking)
path = storage.store(x, "my_dataset")
print("Stored at:", path)

# Load (blocking)
y = storage.load("my_dataset")
print("Loaded shape:", y.shape)

# Verify roundtrip
assert np.allclose(x, y)
assert np.array_equal(x, y)
print("✓ Data verification passed")

Async API

import numpy as np
import genestore
import asyncio

async def main():
    # Create a storage builder and configure it
    builder = genestore.create_storage(f"./lance_data")
    builder.with_max_rows_per_file(500000)
    builder.with_compression("zstd")

    # Build the storage instance
    storage = builder.build()

    # Create a numpy array (dense matrix)
    np.random.seed(42)  # For reproducibility
    data = np.random.randn(1000, 128).astype(np.float64)

    # Store the array (await the async call)
    path = await storage.aio.store(data, "my_dataset")
    print(f"Storage at: {path}")

    # Load the array back using the NAME (not path)
    loaded_data = await storage.aio.load("my_dataset")
    print(f"Loaded shape: {loaded_data.shape}")

    # Verify the data
    assert np.allclose(data, loaded_data)
    print("✓ Data verification passed!")

if __name__ == "__main__":
    asyncio.run(main())

Tests

pip install -r requirements-dev.txt
pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genestore-0.4.0.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genestore-0.4.0-cp312-cp312-manylinux_2_39_x86_64.whl (46.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

File details

Details for the file genestore-0.4.0.tar.gz.

File metadata

  • Download URL: genestore-0.4.0.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for genestore-0.4.0.tar.gz
Algorithm Hash digest
SHA256 7ca9e1c425610c0779de2945dabbefc39f8fe2efef04e2711d8ff165a77be2fd
MD5 186bd28066d2a303dbdab62edbdfea68
BLAKE2b-256 87b00358a558c065da892451355abd33dd3779edc479b037d341494f028e169e

See more details on using hashes here.

File details

Details for the file genestore-0.4.0-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for genestore-0.4.0-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 358120a055d68de3dd985e1dc36e0d6016119a4f7f9038502fd94bfdc1a68655
MD5 d9bcc4fdc7367c65036bacbdda378d06
BLAKE2b-256 914fd37bffc7ca9c08c3c7f3f8f2fee42fbc3eec3ad963027541eb31168c167b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page