A high-level Python package for managing DataFrames using TileDB as a backing store

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jkanche

These details have not been verified by PyPI

Project description

Unit tests

cellarr-frame

A high-level Python package for managing DataFrames using TileDB as a backing store. This package provides two distinct, storage strategies for your data.

DenseCellArrayFrame: For standard DataFrames. Uses TileDB's native 1D array, multi-attribute storage. This is highly efficient for dataframes with columns of mixed types (e.g., numbers, strings, dates).
SparseCellArrayFrame: For sparse DataFrames. Uses a 2D sparse cellarr-array to store data in a "coordinate" (COO) format. This is ideal for very large DataFrames where most values are NaN or 0 (e.g., gene-cell matrices).

Installation

To get started, install the package from PyPI

pip install cellarr-frame

Factory Function: `create_cellarr_frame`

The easiest way to get started is with the create_cellarr_frame factory. It automatically builds the correct TileDB array schema based on an initial DataFrame or specified dim_dtypes.

from cellarr_frame import create_cellarr_frame

# Example 1: Create a DENSE frame by providing an initial DataFrame
df = pd.DataFrame({'A': np.arange(5), 'B': [f'val_{i}' for i in range(5)]})
create_cellarr_frame("my_dense_frame.tdb", sparse=False, df=df)

# Example 2: Create an EMPTY SPARSE frame with integer-based dimensions
create_cellarr_frame("my_sparse_frame_int.tdb", sparse=True, dim_dtypes=[np.uint64, np.uint64])

# Example 3: Create an EMPTY SPARSE frame with string-based dimensions
create_cellarr_frame("my_sparse_frame_str.tdb", sparse=True, dim_dtypes=[str, str])

`DenseCellArrayFrame` (Native DataFrames)

This is the best/standard choice for typical, dense dataframes.

Writing and Appending

This class is designed for efficient appends. The create_cellarr_frame function (or write_dataframe) writes the first chunk, and append_dataframe adds new rows to the end.

import pandas as pd
import numpy as np
from cellarr_frame import create_cellarr_frame, DenseCellArrayFrame

# 1. Create and write the first DataFrame
df1 = pd.DataFrame({
    'A': np.arange(5, dtype=np.int32),
    'B': np.random.rand(5),
    'C': ['foo' + str(i) for i in range(5)]
})
create_cellarr_frame("dense.tdb", sparse=False, df=df1)

# 2. Open the frame and append a second DataFrame
cdf = DenseCellArrayFrame("dense.tdb")
print(f"Shape before append: {cdf.shape}")

df2 = pd.DataFrame({
    'A': np.arange(5, 10, dtype=np.int32),
    'B': np.random.rand(5),
    'C': ['bar' + str(i) for i in range(5)]
})
cdf.append_dataframe(df2)

print(f"Shape after append: {cdf.shape}")

# Shape before append: (5, 3)
# Shape after append: (10, 3)

Reading and Querying

You can read the full DataFrame or query it using standard Python slicing.

# 1. Read the full DataFrame
full_df = cdf.read_dataframe()
print(full_df)

#     A         B      C
# 0   0  0.123456   foo0
# 1   1  0.234567   foo1
# ...
# 8   8  0.456789   bar3
# 9   9  0.567890   bar4

# 2. Querying with __getitem__

# Get specific rows (exclusive slice, like pandas)
row_subset = cdf[5:8]
#    A         B      C
# 5  5  0.345678   bar0
# 6  6  0.456789   bar1
# 7  7  0.567890   bar2

# Get a single column
col_A = cdf['A']
#    A
# 0  0
# 1  1
# ...

# Get multiple columns
cols_AC = cdf[['A', 'C']]
#    A      C
# 0  0   foo0
# 1  1   foo1
# ...

# Get specific rows and columns
subset = cdf[1:3, ['A', 'C']]
#    A      C
# 1  1   foo1
# 2  2   foo2

Properties

print(f"Shape: {cdf.shape}")       # (10, 3)
print(f"Columns: {cdf.columns}")   # Index(['A', 'B', 'C'], dtype='object')
print(f"Index: {cdf.index}")       # RangeIndex(start=0, stop=10, step=1)

2. `SparseCellArrayFrame` (Sparse DataFrames)

This is the best choice for data that is mostly empty (NaN). It only stores the values that exist, saving significant space.

Writing and Appending

Writing to a sparse frame involves stack()-ing the DataFrame to find all non-NaN values and writing them to the 2D array.

import pandas as pd
import numpy as np
from cellarr_frame import create_cellarr_frame, SparseCellArrayFrame

# 1. Create a sparse DataFrame (most values are NaN)
df1 = pd.DataFrame({
    0: [1.0, np.nan],  # Index 0, 1
    1: [np.nan, 2.0]
})

# Create the array and write the data
# We specify integer dtypes for the dimensions (row/col labels)
create_cellarr_frame("sparse.tdb", sparse=True, df=df1, dim_dtypes=[np.uint64, np.uint64])

# 2. Open the frame and append new data
sdf = SparseCellArrayFrame("sparse.tdb")
print(f"Shape before append: {sdf.shape}")

# This new DataFrame will be appended starting at the next available row index
df2 = pd.DataFrame({
    1: [3.0, np.nan],  # Relative index 0, 1
    2: [np.nan, 4.0]
})
sdf.append_dataframe(df2) # Automatically appends at rows 2 and 3

print(f"Shape after append: {sdf.shape}")

# Shape before append: (2, 2)
# Shape after append: (4, 3)

Reading and Querying

Reading reconstructs the DataFrame from the sparse coordinates.

# 1. Read the full DataFrame
full_df = sdf.read_dataframe()
print(full_df)

#      0    1    2
# 0  1.0  NaN  NaN
# 1  NaN  2.0  NaN
# 2  NaN  3.0  NaN
# 3  NaN  NaN  4.0

# 2. Querying with __getitem__

# Get specific rows
row_subset = sdf[1:3]
#      0    1    2
# 1  NaN  2.0  NaN
# 2  NaN  3.0  NaN

# Get specific columns (by label)
col_subset = sdf[[0, 2]]
#      0    2
# 0  1.0  NaN
# 1  NaN  NaN
# 2  NaN  NaN
# 3  NaN  4.0

# Get specific rows and columns
subset = sdf[0:2, [1]]
#      1
# 0  NaN
# 1  2.0

String Dimensions

SparseCellArrayFrame also fully supports string-based row and column labels.

# Create with string dimensions
create_cellarr_frame("sparse_str.tdb", sparse=True, dim_dtypes=[str, str])
sdf_str = SparseCellArrayFrame("sparse_str.tdb")

# Write DataFrame with string index/columns
df_str1 = pd.DataFrame({'col_A': [1.0, np.nan]}, index=['row_A', 'row_B'])
sdf_str.write_dataframe(df_str1)

# Appending with string dimensions just adds the new coordinates
df_str2 = pd.DataFrame({'col_B': [3.0]}, index=['row_C'])
sdf_str.append_dataframe(df_str2)

print(sdf_str.read_dataframe())
#        col_A  col_B
# row_A    1.0    NaN
# row_C    NaN    3.0

[!NOTE]

row_B is missing since all the values are NaN for this column.

Properties

Properties on sparse frames query the array to find the unique dimension labels.

print(f"Shape: {sdf_str.shape}")       # (3, 2)
print(f"Columns: {sdf_str.columns}")   # Index(['col_A', 'col_B'], dtype='object')
print(f"Index: {sdf_str.index}")       # Index(['row_A', 'row_B', 'row_C'], dtype='object')

Note

This project has been set up using BiocSetup and PyScaffold.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jkanche

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.8

Feb 14, 2026

0.0.7

Feb 13, 2026

0.0.6

Feb 10, 2026

0.0.5

Feb 4, 2026

0.0.4

Jan 28, 2026

0.0.3

Jan 25, 2026

This version

0.0.2

Nov 19, 2025

0.0.1

Nov 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellarr_frame-0.0.2.tar.gz (32.5 kB view details)

Uploaded Nov 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cellarr_frame-0.0.2-py3-none-any.whl (14.7 kB view details)

Uploaded Nov 19, 2025 Python 3

File details

Details for the file cellarr_frame-0.0.2.tar.gz.

File metadata

Download URL: cellarr_frame-0.0.2.tar.gz
Upload date: Nov 19, 2025
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_frame-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`704fb21f471287623475fb9dfc5e8ec69479d4e07b7623c1b1a216fdcfa13cad`
MD5	`175e8f31c5949d54f3566235d849189c`
BLAKE2b-256	`c793f50eb0e8d85cb7b8c8b1ce72e2caf62f67fa3799565f3700608623336c05`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_frame-0.0.2.tar.gz:

Publisher: publish-pypi.yml on CellArr/cellarr-frame

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cellarr_frame-0.0.2.tar.gz
- Subject digest: 704fb21f471287623475fb9dfc5e8ec69479d4e07b7623c1b1a216fdcfa13cad
- Sigstore transparency entry: 708827174
- Sigstore integration time: Nov 19, 2025
Source repository:
- Permalink: CellArr/cellarr-frame@30d5a0e9209b870bb619f965a9a5bef581e2e5d8
- Branch / Tag: refs/tags/0.0.2
- Owner: https://github.com/CellArr
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@30d5a0e9209b870bb619f965a9a5bef581e2e5d8
- Trigger Event: push

File details

Details for the file cellarr_frame-0.0.2-py3-none-any.whl.

File metadata

Download URL: cellarr_frame-0.0.2-py3-none-any.whl
Upload date: Nov 19, 2025
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cellarr_frame-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31daf70f7041c04312295ba73d6a4acb365224f82f8134aad9bd7e59a47f2edb`
MD5	`c070ac3d9a12b98f494fa7801d949d37`
BLAKE2b-256	`c443de2f96bb78afb8b34945e5b8c8207186fe855f984e05ab46b5fd73a5cfd2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellarr_frame-0.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on CellArr/cellarr-frame

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cellarr_frame-0.0.2-py3-none-any.whl
- Subject digest: 31daf70f7041c04312295ba73d6a4acb365224f82f8134aad9bd7e59a47f2edb
- Sigstore transparency entry: 708827177
- Sigstore integration time: Nov 19, 2025
Source repository:
- Permalink: CellArr/cellarr-frame@30d5a0e9209b870bb619f965a9a5bef581e2e5d8
- Branch / Tag: refs/tags/0.0.2
- Owner: https://github.com/CellArr
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@30d5a0e9209b870bb619f965a9a5bef581e2e5d8
- Trigger Event: push

cellarr-frame 0.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

cellarr-frame

Installation

Factory Function: create_cellarr_frame

DenseCellArrayFrame (Native DataFrames)

Writing and Appending

Reading and Querying

Properties

2. SparseCellArrayFrame (Sparse DataFrames)

Writing and Appending

Reading and Querying

String Dimensions

Properties

Note

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Factory Function: `create_cellarr_frame`

`DenseCellArrayFrame` (Native DataFrames)

2. `SparseCellArrayFrame` (Sparse DataFrames)