Skip to main content

Python bindings for genegraph-storage. Store your numpy data in Lance format

Project description

pygenestore

Store your numpy arrays at scale using the Lance format. Handles millions of rows as far as the memory goes.

Usage

It is possible to create multiple storages by passing different directories to create_storage.

It is possible to store different arrays in the same storage, just set different names.

Default API (blocking)

import numpy as np
import genestore

# Configure storage
builder = genestore.store_array("./lance_data")
builder.with_max_rows_per_file(500_000)
builder.with_compression("zstd")

# Build storage instance
storage = builder.build()

# Create data (2D float64 numpy array)
np.random.seed(42)
x = np.random.randn(1000, 128).astype(np.float64)

# Store (blocking)
path = storage.store(x, "my_dataset")
print("Stored at:", path)

# Load (blocking)
y = storage.load("my_dataset")
print("Loaded shape:", y.shape)

# Verify roundtrip
assert np.allclose(x, y)
assert np.array_equal(x, y)
print("✓ Data verification passed")

Async API

import numpy as np
import genestore
import asyncio

async def main():
    # Create a storage builder and configure it
    builder = genestore.create_storage(f"./lance_data")
    builder.with_max_rows_per_file(500000)
    builder.with_compression("zstd")

    # Build the storage instance
    storage = builder.build()

    # Create a numpy array (dense matrix)
    np.random.seed(42)  # For reproducibility
    data = np.random.randn(1000, 128).astype(np.float64)

    # Store the array (await the async call)
    path = await storage.aio.store(data, "my_dataset")
    print(f"Storage at: {path}")

    # Load the array back using the NAME (not path)
    loaded_data = await storage.aio.load("my_dataset")
    print(f"Loaded shape: {loaded_data.shape}")

    # Verify the data
    assert np.allclose(data, loaded_data)
    print("✓ Data verification passed!")

if __name__ == "__main__":
    asyncio.run(main())

Tests

pip install -r requirements-dev.txt
pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genestore-0.5.0.tar.gz (47.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genestore-0.5.0-cp312-cp312-manylinux_2_39_x86_64.whl (52.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

File details

Details for the file genestore-0.5.0.tar.gz.

File metadata

  • Download URL: genestore-0.5.0.tar.gz
  • Upload date:
  • Size: 47.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for genestore-0.5.0.tar.gz
Algorithm Hash digest
SHA256 dbcac01384a654bc616026c903667ea1103070c2db6140b534dcede441beeac8
MD5 631b99b632374bb6ed9a9d26f4b119d9
BLAKE2b-256 505b9e485ea9d58323a536565d349e809d20d113f6750fe1a6881383f5227fe8

See more details on using hashes here.

File details

Details for the file genestore-0.5.0-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for genestore-0.5.0-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 be3be8ec05b652cc791df0d8e449fdd6337e5e0d1953520c80eba307d5f0d159
MD5 b185b03ce3851b02b08e8862c648b1c4
BLAKE2b-256 1c5ed720a089cc0d6663fc0398013ac9d7aca539f3d7837b9c019fd1262c6b5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page