TileDB-backed SummarizedExperiment using cellarr objects
Project description
cellarr-se
cellarr-se is a read-only, out-of-core coordinator for TileDB-backed genomic datasets. It wraps the cellarr-array and cellarr-frame primitives into a lazy, SummarizedExperiment-compatible interface, so you can slice large genomics datasets stored on disk without loading them into memory.
Single-cell and bulk RNA-seq datasets frequently exceed available RAM. cellarr-se keeps assay matrices and metadata tables on disk as TileDB arrays, performing synchronized lazy slices across all components only when you request them. The result is always a standard in-memory SummarizedExperiment object.
Install
pip install cellarr-se
Usage
Construction
CellArraySE wraps existing TileDB arrays and frames; it does not create them. Use cellarr-array and cellarr-frame to build the backing stores first.
from cellarr_se import CellArraySE
se = CellArraySE(
assays={"counts": my_cell_array, "tpm": my_tpm_array},
row_data=my_row_frame, # gene annotations (CellArrayFrame)
col_data=my_col_frame, # sample annotations (CellArrayFrame)
)
Inspection
se.shape # (n_genes, n_samples)
se.assay_names # ["counts", "tpm"]
se.row_names # pd.Index of gene identifiers
se.col_names # pd.Index of sample identifiers
se.row_columns # list of gene metadata fields
se.col_columns # list of sample metadata fields
se.show() # print a summary with the first 5 rows of each metadata table
repr(se) # <CellArraySE: 20000x500 | counts, tpm>
Slicing
Bracket notation supports integer indices, slices, name strings, and lists:
# Positional slice
subset = se[0:100, 0:50]
# Single element
gene = se[5, 3]
# Lists of indices or names
subset = se[["BRCA1", "TP53"], ["sample_001", "sample_042"]]
For attribute-filtered access, use slice() with TileDB query strings:
# Filter rows and columns by metadata attributes
subset = se.slice(
row_query="gene_type == 'protein_coding'",
col_query="tissue == 'liver'",
)
# Combine query with explicit column selection
subset = se.slice(
row_query="gene_type == 'protein_coding'",
col_subset=slice(0, 50),
assays=["counts"],
row_columns=["gene_id", "gene_name"],
)
Both se[...] and se.slice(...) return a standard in-memory SummarizedExperiment.
Assay metadata
se.is_sparse("counts") # True if backed by SparseCellArray
se.get_assay_type("counts") # numpy dtype of the assay
Demo
A worked example covering construction, inspection, and slicing is available in the demo notebook.
Note
This project has been set up using BiocSetup and PyScaffold.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cellarr_se-0.1.0.tar.gz.
File metadata
- Download URL: cellarr_se-0.1.0.tar.gz
- Upload date:
- Size: 32.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f028a1614e4e39e7ee55d3699ed4d246948f8b7215874b0a835f21b77217284
|
|
| MD5 |
63d23246f6e06ed2ad7d0e4a7b4220b1
|
|
| BLAKE2b-256 |
ad6602c5739994d015718e323be287edd262f33a4138c6292b5b85fc35759a57
|
Provenance
The following attestation bundles were made for cellarr_se-0.1.0.tar.gz:
Publisher:
publish-pypi.yml on CellArr/cellarr-se
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cellarr_se-0.1.0.tar.gz -
Subject digest:
2f028a1614e4e39e7ee55d3699ed4d246948f8b7215874b0a835f21b77217284 - Sigstore transparency entry: 1206764902
- Sigstore integration time:
-
Permalink:
CellArr/cellarr-se@24d542c66303bb19d9a7fa1d20b4ac2cb54e2841 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/CellArr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@24d542c66303bb19d9a7fa1d20b4ac2cb54e2841 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cellarr_se-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cellarr_se-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c71a6d067005cc6af7aec6f8bdad7e9b061044b3031c05a61b8e3dc61a25208b
|
|
| MD5 |
25ccbb6c13166ae1fae1f1e854f62f2a
|
|
| BLAKE2b-256 |
ae7ed749f60be8c17501e4781621fdad75b6ea08521dda5d03548be3c11d8dce
|
Provenance
The following attestation bundles were made for cellarr_se-0.1.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on CellArr/cellarr-se
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cellarr_se-0.1.0-py3-none-any.whl -
Subject digest:
c71a6d067005cc6af7aec6f8bdad7e9b061044b3031c05a61b8e3dc61a25208b - Sigstore transparency entry: 1206764945
- Sigstore integration time:
-
Permalink:
CellArr/cellarr-se@24d542c66303bb19d9a7fa1d20b4ac2cb54e2841 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/CellArr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@24d542c66303bb19d9a7fa1d20b4ac2cb54e2841 -
Trigger Event:
push
-
Statement type: