Skip to main content

TileDB-backed SummarizedExperiment using cellarr objects

Project description

PyPI-Server CI License: MIT

cellarr-se

cellarr-se is a read-only, out-of-core coordinator for TileDB-backed genomic datasets. It wraps the cellarr-array and cellarr-frame primitives into a lazy, SummarizedExperiment-compatible interface, so you can slice large genomics datasets stored on disk without loading them into memory.

Single-cell and bulk RNA-seq datasets frequently exceed available RAM. cellarr-se keeps assay matrices and metadata tables on disk as TileDB arrays, performing synchronized lazy slices across all components only when you request them. The result is always a standard in-memory SummarizedExperiment object.

Install

pip install cellarr-se

Usage

Construction

CellArraySE wraps existing TileDB arrays and frames; it does not create them. Use cellarr-array and cellarr-frame to build the backing stores first.

from cellarr_se import CellArraySE

se = CellArraySE(
    assays={"counts": my_cell_array, "tpm": my_tpm_array},
    row_data=my_row_frame,   # gene annotations (CellArrayFrame)
    col_data=my_col_frame,   # sample annotations (CellArrayFrame)
)

Inspection

se.shape          # (n_genes, n_samples)
se.assay_names    # ["counts", "tpm"]
se.row_names      # pd.Index of gene identifiers
se.col_names      # pd.Index of sample identifiers
se.row_columns    # list of gene metadata fields
se.col_columns    # list of sample metadata fields

se.show()         # print a summary with the first 5 rows of each metadata table
repr(se)          # <CellArraySE: 20000x500 | counts, tpm>

Slicing

Bracket notation supports integer indices, slices, name strings, and lists:

# Positional slice
subset = se[0:100, 0:50]

# Single element
gene = se[5, 3]

# Lists of indices or names
subset = se[["BRCA1", "TP53"], ["sample_001", "sample_042"]]

For attribute-filtered access, use slice() with TileDB query strings:

# Filter rows and columns by metadata attributes
subset = se.slice(
    row_query="gene_type == 'protein_coding'",
    col_query="tissue == 'liver'",
)

# Combine query with explicit column selection
subset = se.slice(
    row_query="gene_type == 'protein_coding'",
    col_subset=slice(0, 50),
    assays=["counts"],
    row_columns=["gene_id", "gene_name"],
)

Both se[...] and se.slice(...) return a standard in-memory SummarizedExperiment.

Assay metadata

se.is_sparse("counts")        # True if backed by SparseCellArray
se.get_assay_type("counts")   # numpy dtype of the assay

Demo

A worked example covering construction, inspection, and slicing is available in the demo notebook.

Note

This project has been set up using BiocSetup and PyScaffold.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellarr_se-0.1.0.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cellarr_se-0.1.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file cellarr_se-0.1.0.tar.gz.

File metadata

  • Download URL: cellarr_se-0.1.0.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_se-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2f028a1614e4e39e7ee55d3699ed4d246948f8b7215874b0a835f21b77217284
MD5 63d23246f6e06ed2ad7d0e4a7b4220b1
BLAKE2b-256 ad6602c5739994d015718e323be287edd262f33a4138c6292b5b85fc35759a57

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_se-0.1.0.tar.gz:

Publisher: publish-pypi.yml on CellArr/cellarr-se

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cellarr_se-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cellarr_se-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_se-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c71a6d067005cc6af7aec6f8bdad7e9b061044b3031c05a61b8e3dc61a25208b
MD5 25ccbb6c13166ae1fae1f1e854f62f2a
BLAKE2b-256 ae7ed749f60be8c17501e4781621fdad75b6ea08521dda5d03548be3c11d8dce

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_se-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on CellArr/cellarr-se

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page