Skip to main content

Python bindings for genegraph-storage. Store your numpy data in Lance format

Project description

pygenestore

Store your numpy arrays at scale using the Lance format. Handles millions of rows as far as the memory goes.

Usage

It is possible to create multiple storages by passing different directories to create_storage.

It is possible to store different arrays in the same storage, just set different names.

Default API (blocking)

import numpy as np
import genestore

# Configure storage
builder = genestore.store_array("./lance_data")
builder.with_max_rows_per_file(500_000)
builder.with_compression("zstd")

# Build storage instance
storage = builder.build()

# Create data (2D float64 numpy array)
np.random.seed(42)
x = np.random.randn(1000, 128).astype(np.float64)

# Store (blocking)
path = storage.store(x, "my_dataset")
print("Stored at:", path)

# Load (blocking)
y = storage.load("my_dataset")
print("Loaded shape:", y.shape)

# Verify roundtrip
assert np.allclose(x, y)
assert np.array_equal(x, y)
print("✓ Data verification passed")

Async API

import numpy as np
import genestore
import asyncio

async def main():
    # Create a storage builder and configure it
    builder = genestore.create_storage(f"./lance_data")
    builder.with_max_rows_per_file(500000)
    builder.with_compression("zstd")

    # Build the storage instance
    storage = builder.build()

    # Create a numpy array (dense matrix)
    np.random.seed(42)  # For reproducibility
    data = np.random.randn(1000, 128).astype(np.float64)

    # Store the array (await the async call)
    path = await storage.aio.store(data, "my_dataset")
    print(f"Storage at: {path}")

    # Load the array back using the NAME (not path)
    loaded_data = await storage.aio.load("my_dataset")
    print(f"Loaded shape: {loaded_data.shape}")

    # Verify the data
    assert np.allclose(data, loaded_data)
    print("✓ Data verification passed!")

if __name__ == "__main__":
    asyncio.run(main())

Tests

pip install -r requirements-dev.txt
pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genestore-0.10.0.tar.gz (46.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genestore-0.10.0-cp312-cp312-manylinux_2_39_x86_64.whl (52.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

File details

Details for the file genestore-0.10.0.tar.gz.

File metadata

  • Download URL: genestore-0.10.0.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for genestore-0.10.0.tar.gz
Algorithm Hash digest
SHA256 7b5a3f45ba739f0660f9edca916c9e33a7494f6e0f2f560ecbaf6b7ce5ca78dc
MD5 519dc52ecc184524c07ae461f4aecfe7
BLAKE2b-256 4dfb8084f79fab52b2036be472631e6de7e73304bf4711863feb9c78821fbbc7

See more details on using hashes here.

File details

Details for the file genestore-0.10.0-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for genestore-0.10.0-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 93944a786a93474f8c4b42ddf75da27b6bab734e6fa6841dbe05d2ceef76e1db
MD5 644bca93a04cfa3c4168d2061c213d2a
BLAKE2b-256 95f5916d4042f5424db8dbe5c36a51933cc718440291069df0ef465b59eb19ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page