Skip to main content

Base class for handling TileDB backed arrays.

Project description

PyPI-Server Unit tests

cellarr-array

This package provided high-level wrappers for TileDB arrays, for handling genomic data matrices.

Install

To get started, install the package from PyPI

pip install cellarr-array

Quick Start

Creating Arrays

import numpy as np
from scipy import sparse
from cellarr_array import create_cellarray, CellArrConfig

# Create a dense 2D array
dense_array = create_cellarray(
    uri="dense_matrix.tdb",
    shape=(10000, 5000),
    attr_dtype=np.float32,
    sparse=False,
    dim_names=["cells", "genes"]
)

# Create a sparse 2D array with custom compression
config = CellArrConfig(
    tile_capacity=1000,
    attrs_filters={"data": [{"name": "zstd", "level": 7}]}
)
sparse_array = create_cellarray(
    uri="sparse_matrix.tdb",
    shape=(10000, 5000),
    attr_dtype=np.float32,
    sparse=True,
    config=config,
    dim_names=["cells", "genes"]
)

# Create a 1D array
array_1d = create_cellarray(
    uri="vector.tdb",
    shape=(1000,),
    attr_dtype=np.float32,
    sparse=False
)

Writing Data

# Writing to dense arrays
data = np.random.random((1000, 5000)).astype(np.float32)
dense_array.write_batch(data, start_row=0)

# Writing to sparse arrays
sparse_data = sparse.random(1000, 5000, density=0.1, format="csr", dtype=np.float32)
sparse_array.write_batch(sparse_data, start_row=0)

# Writing to 1D arrays
data_1d = np.random.random(100).astype(np.float32)
array_1d.write_batch(data_1d, start_row=0)

Reading Data

# Slicing operations (similar to NumPy)

# Full slice
full_data = dense_array[:]

# Partial slice
subset = dense_array[100:200, 1000:2000]

# Using lists of indices
cells = [10, 20, 30]
genes = [5, 15, 25]
subset = dense_array[cells, genes]

# Mixed slicing
subset = dense_array[100:200, genes]

Working with Sparse Arrays

from cellarr_array import SparseCellArray

# Create a sparse array with CSR output format
csr_array = SparseCellArray(
    uri="sparse_matrix.tdb",
    return_sparse=True
)

# Get result as CSR matrix
result = csr_array[100:200, 500:1000]

# Result is scipy.sparse.coo_matrix
assert sparse.isspmatrix_csr(result)

# Perform sparse operations
nnz = result.nnz
density = result.nnz / (result.shape[0] * result.shape[1])

# Convert to other sparse formats if needed
result_csc = result.tocsc()

Likewise create a CSC output format

from scipy import sparse

# Create a sparse array with CSC output format
csc_array = SparseCellArray(
    uri="sparse_matrix.tdb",
    return_sparse=True,
    sparse_coerce=sparse.csc_matrix
)

# Get result as CSR matrix
result = csc_array[100:200, 500:1000]
print(result)

Array Maintenance

# Consolidate fragments
array.consolidate()

# Custom consolidation
config = ConsolidationConfig(
    steps=2,
    vacuum_after=True
)
array.consolidate(config)

# Vacuum
array.vacuum()

Note

This project has been set up using BiocSetup and PyScaffold.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellarr_array-0.3.3.tar.gz (46.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cellarr_array-0.3.3-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file cellarr_array-0.3.3.tar.gz.

File metadata

  • Download URL: cellarr_array-0.3.3.tar.gz
  • Upload date:
  • Size: 46.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_array-0.3.3.tar.gz
Algorithm Hash digest
SHA256 722f763bca0af8a5094de90a73754dbe6a2f100d4a82b3ce84d9fbc2134cc2c3
MD5 ce32d86de3e46a3b74a5993982a09a51
BLAKE2b-256 f6aa0d2c3ac668a581c0be359af0c84c42a2639e8c133b0b32a18a3449efbd6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_array-0.3.3.tar.gz:

Publisher: publish-pypi.yml on CellArr/cellarr-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cellarr_array-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: cellarr_array-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_array-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 680cd44c0aac623bb356868b92cd1cd33d791e7ef06ec08f8654aaa8ebad85cd
MD5 ccff5341587fe49cd01b9d0f940b8bc6
BLAKE2b-256 75956e436217649ae7245d13f280cf00f71d3bfd13fb8939ae6075fb0b6b4bb4

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_array-0.3.3-py3-none-any.whl:

Publisher: publish-pypi.yml on CellArr/cellarr-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page