Skip to main content

Python bindings for genegraph-storage. Store your numpy data in Lance format

Project description

pygenestore

Store your numpy arrays at scale using the Lance format. Handles millions of rows as far as the memory goes.

Usage

It is possible to create multiple storages by passing different directories to create_storage.

It is possible to store different arrays in the same storage, just set different names.

Default API (blocking)

import numpy as np
import genestore

# Configure storage
builder = genestore.store_array("./lance_data")
builder.with_max_rows_per_file(500_000)
builder.with_compression("zstd")

# Build storage instance
storage = builder.build()

# Create data (2D float64 numpy array)
np.random.seed(42)
x = np.random.randn(1000, 128).astype(np.float64)

# Store (blocking)
path = storage.store(x, "my_dataset")
print("Stored at:", path)

# Load (blocking)
y = storage.load("my_dataset")
print("Loaded shape:", y.shape)

# Verify roundtrip
assert np.allclose(x, y)
assert np.array_equal(x, y)
print("✓ Data verification passed")

Async API

import numpy as np
import genestore
import asyncio

async def main():
    # Create a storage builder and configure it
    builder = genestore.create_storage(f"./lance_data")
    builder.with_max_rows_per_file(500000)
    builder.with_compression("zstd")

    # Build the storage instance
    storage = builder.build()

    # Create a numpy array (dense matrix)
    np.random.seed(42)  # For reproducibility
    data = np.random.randn(1000, 128).astype(np.float64)

    # Store the array (await the async call)
    path = await storage.aio.store(data, "my_dataset")
    print(f"Storage at: {path}")

    # Load the array back using the NAME (not path)
    loaded_data = await storage.aio.load("my_dataset")
    print(f"Loaded shape: {loaded_data.shape}")

    # Verify the data
    assert np.allclose(data, loaded_data)
    print("✓ Data verification passed!")

if __name__ == "__main__":
    asyncio.run(main())

Tests

pip install -r requirements-dev.txt
pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genestore-0.3.0.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genestore-0.3.0-cp312-cp312-manylinux_2_39_x86_64.whl (46.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

File details

Details for the file genestore-0.3.0.tar.gz.

File metadata

  • Download URL: genestore-0.3.0.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for genestore-0.3.0.tar.gz
Algorithm Hash digest
SHA256 10f0068b4e6cefd201ba0aa3a7ab46ab7230f946d19898685cb9300589739923
MD5 948a2726b5f2655b1dff2cd4c37fa708
BLAKE2b-256 d8fc7fcd340fc9fac9304775c3ff002aff9857c0cca41f91425be19952fc9d59

See more details on using hashes here.

File details

Details for the file genestore-0.3.0-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for genestore-0.3.0-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 b38203c9639ea44eede2263746cca34c8423cb6248785c3ba070f8f80b95d061
MD5 99edcb821776d08f48d201a2d89aa787
BLAKE2b-256 e44dd67bdea01e3c0343d4788ddd287fba36ec29c70e53383f9d36a7bd416057

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page