Skip to main content

Base class for handling TileDB backed arrays.

Project description

PyPI-Server Unit tests

cellarr-array

This package provided high-level wrappers for TileDB arrays, for handling genomic data matrices.

Install

To get started, install the package from PyPI

pip install cellarr-array

Quick Start

Creating Arrays

import numpy as np
from scipy import sparse
from cellarr_array import create_cellarray, CellArrConfig

# Create a dense 2D array
dense_array = create_cellarray(
    uri="dense_matrix.tdb",
    shape=(10000, 5000),
    attr_dtype=np.float32,
    sparse=False,
    dim_names=["cells", "genes"]
)

# Create a sparse 2D array with custom compression
config = CellArrConfig(
    tile_capacity=1000,
    attrs_filters={"data": [{"name": "zstd", "level": 7}]}
)
sparse_array = create_cellarray(
    uri="sparse_matrix.tdb",
    shape=(10000, 5000),
    attr_dtype=np.float32,
    sparse=True,
    config=config,
    dim_names=["cells", "genes"]
)

# Create a 1D array
array_1d = create_cellarray(
    uri="vector.tdb",
    shape=(1000,),
    attr_dtype=np.float32,
    sparse=False
)

Writing Data

# Writing to dense arrays
data = np.random.random((1000, 5000)).astype(np.float32)
dense_array.write_batch(data, start_row=0)

# Writing to sparse arrays
sparse_data = sparse.random(1000, 5000, density=0.1, format="csr", dtype=np.float32)
sparse_array.write_batch(sparse_data, start_row=0)

# Writing to 1D arrays
data_1d = np.random.random(100).astype(np.float32)
array_1d.write_batch(data_1d, start_row=0)

Reading Data

# Slicing operations (similar to NumPy)

# Full slice
full_data = dense_array[:]

# Partial slice
subset = dense_array[100:200, 1000:2000]

# Using lists of indices
cells = [10, 20, 30]
genes = [5, 15, 25]
subset = dense_array[cells, genes]

# Mixed slicing
subset = dense_array[100:200, genes]

Working with Sparse Arrays

from cellarr_array import SparseCellArray

# Create a sparse array with CSR output format
csr_array = SparseCellArray(
    uri="sparse_matrix.tdb",
    return_sparse=True
)

# Get result as CSR matrix
result = csr_array[100:200, 500:1000]

# Result is scipy.sparse.coo_matrix
assert sparse.isspmatrix_csr(result)

# Perform sparse operations
nnz = result.nnz
density = result.nnz / (result.shape[0] * result.shape[1])

# Convert to other sparse formats if needed
result_csc = result.tocsc()

Likewise create a CSC output format

from scipy import sparse

# Create a sparse array with CSC output format
csc_array = SparseCellArray(
    uri="sparse_matrix.tdb",
    return_sparse=True,
    sparse_coerce=sparse.csc_matrix
)

# Get result as CSR matrix
result = csc_array[100:200, 500:1000]
print(result)

Array Maintenance

# Consolidate fragments
array.consolidate()

# Custom consolidation
config = ConsolidationConfig(
    steps=2,
    vacuum_after=True
)
array.consolidate(config)

# Vacuum
array.vacuum()

Note

This project has been set up using BiocSetup and PyScaffold.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellarr_array-0.3.1.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cellarr_array-0.3.1-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file cellarr_array-0.3.1.tar.gz.

File metadata

  • Download URL: cellarr_array-0.3.1.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_array-0.3.1.tar.gz
Algorithm Hash digest
SHA256 6d670459baa536a1c8d8c1de8531047b15c716bbc529a9b474025a2c3d0db388
MD5 0ae93cd15edd818750474d77d35b4723
BLAKE2b-256 1b34ef417d688902d9bed8c36d4e87412ecbc625f74e6a0646cf0fe5681e8417

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_array-0.3.1.tar.gz:

Publisher: publish-pypi.yml on CellArr/cellarr-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cellarr_array-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: cellarr_array-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_array-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5287a3cb28f226cb5528647ec59b99ad98cc3e9c926058aae3bcedf13bbd4e58
MD5 05b5e4526b2b75efe420ba92a42adc65
BLAKE2b-256 213869fbe436cb614f49266ebef89fd0377d5473ad6222fb7363f4326494670a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_array-0.3.1-py3-none-any.whl:

Publisher: publish-pypi.yml on CellArr/cellarr-array

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page