Skip to main content

Base class for handling TileDB backed arrays.

Project description

PyPI-Server Unit tests

cellarr-array

This package provided high-level wrappers for TileDB arrays, for handling genomic data matrices.

Install

To get started, install the package from PyPI

pip install cellarr-array

Quick Start

Creating Arrays

import numpy as np
from scipy import sparse
from cellarr_array import create_cellarray, CellArrConfig

# Create a dense 2D array
dense_array = create_cellarray(
    uri="dense_matrix.tdb",
    shape=(10000, 5000),
    attr_dtype=np.float32,
    sparse=False,
    dim_names=["cells", "genes"]
)

# Create a sparse 2D array with custom compression
config = CellArrConfig(
    tile_capacity=1000,
    attrs_filters={"data": [{"name": "zstd", "level": 7}]}
)
sparse_array = create_cellarray(
    uri="sparse_matrix.tdb",
    shape=(10000, 5000),
    attr_dtype=np.float32,
    sparse=True,
    config=config,
    dim_names=["cells", "genes"]
)

# Create a 1D array
array_1d = create_cellarray(
    uri="vector.tdb",
    shape=(1000,),
    attr_dtype=np.float32,
    sparse=False
)

Writing Data

# Writing to dense arrays
data = np.random.random((1000, 5000)).astype(np.float32)
dense_array.write_batch(data, start_row=0)

# Writing to sparse arrays
sparse_data = sparse.random(1000, 5000, density=0.1, format="csr", dtype=np.float32)
sparse_array.write_batch(sparse_data, start_row=0)

# Writing to 1D arrays
data_1d = np.random.random(100).astype(np.float32)
array_1d.write_batch(data_1d, start_row=0)

Reading Data

# Slicing operations (similar to NumPy)

# Full slice
full_data = dense_array[:]

# Partial slice
subset = dense_array[100:200, 1000:2000]

# Using lists of indices
cells = [10, 20, 30]
genes = [5, 15, 25]
subset = dense_array[cells, genes]

# Mixed slicing
subset = dense_array[100:200, genes]

Working with Sparse Arrays

from cellarr_array import SparseCellArray

# Create a sparse array with CSR output format
csr_array = SparseCellArray(
    uri="sparse_matrix.tdb",
    return_sparse=True
)

# Get result as CSR matrix
result = csr_array[100:200, 500:1000]

# Result is scipy.sparse.coo_matrix
assert sparse.isspmatrix_csr(result)

# Perform sparse operations
nnz = result.nnz
density = result.nnz / (result.shape[0] * result.shape[1])

# Convert to other sparse formats if needed
result_csc = result.tocsc()

Likewise create a CSC output format

from scipy import sparse

# Create a sparse array with CSC output format
csc_array = SparseCellArray(
    uri="sparse_matrix.tdb",
    return_sparse=True,
    sparse_coerce=sparse.csc_matrix
)

# Get result as CSR matrix
result = csc_array[100:200, 500:1000]
print(result)

Array Maintenance

# Consolidate fragments
array.consolidate()

# Custom consolidation
config = ConsolidationConfig(
    steps=2,
    vacuum_after=True
)
array.consolidate(config)

# Vacuum
array.vacuum()

Note

This project has been set up using BiocSetup and PyScaffold.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellarr_array-0.3.2.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cellarr_array-0.3.2-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file cellarr_array-0.3.2.tar.gz.

File metadata

  • Download URL: cellarr_array-0.3.2.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_array-0.3.2.tar.gz
Algorithm Hash digest
SHA256 43757f19eac1e3c8791ae08fe269bedbc88db584e845fc86e58725f9d2037261
MD5 c6193bc52f8d9ce13057619f453e5e5d
BLAKE2b-256 04aba09dfc468c247193f025ce5fe93d66468f8698be6c493a66109d683e75dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_array-0.3.2.tar.gz:

Publisher: publish-pypi.yml on CellArr/cellarr-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cellarr_array-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: cellarr_array-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_array-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 617648a268cb8471dbceb709a7d683243632399b1ac6e3a6cfff5a2ff56f3c33
MD5 f5717d682be6f217c6ded14fafb9b1d8
BLAKE2b-256 aac7c80336b8f5ed03eaeb62e927d341dbeaa20007728d5c0d41d650a4a23a46

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_array-0.3.2-py3-none-any.whl:

Publisher: publish-pypi.yml on CellArr/cellarr-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page