Base class for handling TileDB backed arrays.
Project description
cellarr-array
This package provided high-level wrappers for TileDB arrays, for handling genomic data matrices.
Install
To get started, install the package from PyPI
pip install cellarr-array
Quick Start
Creating Arrays
import numpy as np
from scipy import sparse
from cellarr_array import create_cellarray, CellArrConfig
# Create a dense 2D array
dense_array = create_cellarray(
uri="dense_matrix.tdb",
shape=(10000, 5000),
attr_dtype=np.float32,
sparse=False,
dim_names=["cells", "genes"]
)
# Create a sparse 2D array with custom compression
config = CellArrConfig(
tile_capacity=1000,
attrs_filters={"data": [{"name": "zstd", "level": 7}]}
)
sparse_array = create_cellarray(
uri="sparse_matrix.tdb",
shape=(10000, 5000),
attr_dtype=np.float32,
sparse=True,
config=config,
dim_names=["cells", "genes"]
)
# Create a 1D array
array_1d = create_cellarray(
uri="vector.tdb",
shape=(1000,),
attr_dtype=np.float32,
sparse=False
)
Writing Data
# Writing to dense arrays
data = np.random.random((1000, 5000)).astype(np.float32)
dense_array.write_batch(data, start_row=0)
# Writing to sparse arrays
sparse_data = sparse.random(1000, 5000, density=0.1, format="csr", dtype=np.float32)
sparse_array.write_batch(sparse_data, start_row=0)
# Writing to 1D arrays
data_1d = np.random.random(100).astype(np.float32)
array_1d.write_batch(data_1d, start_row=0)
Reading Data
# Slicing operations (similar to NumPy)
# Full slice
full_data = dense_array[:]
# Partial slice
subset = dense_array[100:200, 1000:2000]
# Using lists of indices
cells = [10, 20, 30]
genes = [5, 15, 25]
subset = dense_array[cells, genes]
# Mixed slicing
subset = dense_array[100:200, genes]
Working with Sparse Arrays
from cellarr_array import SparseCellArray
# Create a sparse array with CSR output format
csr_array = SparseCellArray(
uri="sparse_matrix.tdb",
return_sparse=True
)
# Get result as CSR matrix
result = csr_array[100:200, 500:1000]
# Result is scipy.sparse.coo_matrix
assert sparse.isspmatrix_csr(result)
# Perform sparse operations
nnz = result.nnz
density = result.nnz / (result.shape[0] * result.shape[1])
# Convert to other sparse formats if needed
result_csc = result.tocsc()
Likewise create a CSC output format
from scipy import sparse
# Create a sparse array with CSC output format
csc_array = SparseCellArray(
uri="sparse_matrix.tdb",
return_sparse=True,
sparse_coerce=sparse.csc_matrix
)
# Get result as CSR matrix
result = csc_array[100:200, 500:1000]
print(result)
Array Maintenance
# Consolidate fragments
array.consolidate()
# Custom consolidation
config = ConsolidationConfig(
steps=2,
vacuum_after=True
)
array.consolidate(config)
# Vacuum
array.vacuum()
Note
This project has been set up using BiocSetup and PyScaffold.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cellarr_array-0.3.0.tar.gz.
File metadata
- Download URL: cellarr_array-0.3.0.tar.gz
- Upload date:
- Size: 45.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e9cfcb850eb0af2a28c23ce4ec73efc688e79edb1ba2549ce34263602fa041f
|
|
| MD5 |
bed531f70f98c35fcc976778e89661df
|
|
| BLAKE2b-256 |
d6c4051a48007432da65c726e41e895ac409e5fab127baec9ce1a18207f0cb48
|
Provenance
The following attestation bundles were made for cellarr_array-0.3.0.tar.gz:
Publisher:
publish-pypi.yml on CellArr/cellarr-array
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cellarr_array-0.3.0.tar.gz -
Subject digest:
5e9cfcb850eb0af2a28c23ce4ec73efc688e79edb1ba2549ce34263602fa041f - Sigstore transparency entry: 646786511
- Sigstore integration time:
-
Permalink:
CellArr/cellarr-array@72390f259ebfb9e7d38d63afaad9a9c21162189e -
Branch / Tag:
refs/tags/0.3.0 - Owner: https://github.com/CellArr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@72390f259ebfb9e7d38d63afaad9a9c21162189e -
Trigger Event:
push
-
Statement type:
File details
Details for the file cellarr_array-0.3.0-py3-none-any.whl.
File metadata
- Download URL: cellarr_array-0.3.0-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea14f0713b45a49988f30cd7a167817e8a618bbdf00c1e96c5507bca78cf14d6
|
|
| MD5 |
bf7d50054380699d3013da8f3a9095cc
|
|
| BLAKE2b-256 |
3f2143e31a4a267c103395461a96f1ae6c7d8bad2ddf188c5a177495475200b8
|
Provenance
The following attestation bundles were made for cellarr_array-0.3.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on CellArr/cellarr-array
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cellarr_array-0.3.0-py3-none-any.whl -
Subject digest:
ea14f0713b45a49988f30cd7a167817e8a618bbdf00c1e96c5507bca78cf14d6 - Sigstore transparency entry: 646786529
- Sigstore integration time:
-
Permalink:
CellArr/cellarr-array@72390f259ebfb9e7d38d63afaad9a9c21162189e -
Branch / Tag:
refs/tags/0.3.0 - Owner: https://github.com/CellArr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@72390f259ebfb9e7d38d63afaad9a9c21162189e -
Trigger Event:
push
-
Statement type: