Skip to main content

Base class for handling TileDB backed arrays.

Project description

PyPI-Server Unit tests

cellarr-array

This package provided high-level wrappers for TileDB arrays, for handling genomic data matrices.

Install

To get started, install the package from PyPI

pip install cellarr-array

Quick Start

Creating Arrays

import numpy as np
from scipy import sparse
from cellarr_array import create_cellarray, CellArrConfig

# Create a dense 2D array
dense_array = create_cellarray(
    uri="dense_matrix.tdb",
    shape=(10000, 5000),
    attr_dtype=np.float32,
    sparse=False,
    dim_names=["cells", "genes"]
)

# Create a sparse 2D array with custom compression
config = CellArrConfig(
    tile_capacity=1000,
    attrs_filters={"data": [{"name": "zstd", "level": 7}]}
)
sparse_array = create_cellarray(
    uri="sparse_matrix.tdb",
    shape=(10000, 5000),
    attr_dtype=np.float32,
    sparse=True,
    config=config,
    dim_names=["cells", "genes"]
)

# Create a 1D array
array_1d = create_cellarray(
    uri="vector.tdb",
    shape=(1000,),
    attr_dtype=np.float32,
    sparse=False
)

Writing Data

# Writing to dense arrays
data = np.random.random((1000, 5000)).astype(np.float32)
dense_array.write_batch(data, start_row=0)

# Writing to sparse arrays
sparse_data = sparse.random(1000, 5000, density=0.1, format="csr", dtype=np.float32)
sparse_array.write_batch(sparse_data, start_row=0)

# Writing to 1D arrays
data_1d = np.random.random(100).astype(np.float32)
array_1d.write_batch(data_1d, start_row=0)

Reading Data

# Slicing operations (similar to NumPy)

# Full slice
full_data = dense_array[:]

# Partial slice
subset = dense_array[100:200, 1000:2000]

# Using lists of indices
cells = [10, 20, 30]
genes = [5, 15, 25]
subset = dense_array[cells, genes]

# Mixed slicing
subset = dense_array[100:200, genes]

Working with Sparse Arrays

from cellarr_array import SparseCellArray

# Create a sparse array with CSR output format
csr_array = SparseCellArray(
    uri="sparse_matrix.tdb",
    return_sparse=True
)

# Get result as CSR matrix
result = csr_array[100:200, 500:1000]

# Result is scipy.sparse.coo_matrix
assert sparse.isspmatrix_csr(result)

# Perform sparse operations
nnz = result.nnz
density = result.nnz / (result.shape[0] * result.shape[1])

# Convert to other sparse formats if needed
result_csc = result.tocsc()

Likewise create a CSC output format

from scipy import sparse

# Create a sparse array with CSC output format
csc_array = SparseCellArray(
    uri="sparse_matrix.tdb",
    return_sparse=True,
    sparse_coerce=sparse.csc_matrix
)

# Get result as CSR matrix
result = csc_array[100:200, 500:1000]
print(result)

Array Maintenance

# Consolidate fragments
array.consolidate()

# Custom consolidation
config = ConsolidationConfig(
    steps=2,
    vacuum_after=True
)
array.consolidate(config)

# Vacuum
array.vacuum()

Note

This project has been set up using BiocSetup and PyScaffold.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellarr_array-0.3.0.tar.gz (45.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cellarr_array-0.3.0-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file cellarr_array-0.3.0.tar.gz.

File metadata

  • Download URL: cellarr_array-0.3.0.tar.gz
  • Upload date:
  • Size: 45.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_array-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5e9cfcb850eb0af2a28c23ce4ec73efc688e79edb1ba2549ce34263602fa041f
MD5 bed531f70f98c35fcc976778e89661df
BLAKE2b-256 d6c4051a48007432da65c726e41e895ac409e5fab127baec9ce1a18207f0cb48

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_array-0.3.0.tar.gz:

Publisher: publish-pypi.yml on CellArr/cellarr-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cellarr_array-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: cellarr_array-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_array-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea14f0713b45a49988f30cd7a167817e8a618bbdf00c1e96c5507bca78cf14d6
MD5 bf7d50054380699d3013da8f3a9095cc
BLAKE2b-256 3f2143e31a4a267c103395461a96f1ae6c7d8bad2ddf188c5a177495475200b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_array-0.3.0-py3-none-any.whl:

Publisher: publish-pypi.yml on CellArr/cellarr-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page