A lightweight implementation of a multinomial Naive Bayes classifier for Annotation Transfer of single-cell data.

Project description

CAP-Naive-Bayes

A lightweight, extensible implementation of a multinomial Naive Bayes classifier in pure Python. It is designed for Annotation Transfer of single-cell data, allowing you to fit and predict on large datasets efficiently using out-of-core chunked processing.

Main Features:

Out-of-core chunked processing: Efficiently handle large datasets without loading everything into memory.
Support for missing features: Can handle datasets where some features are missing during prediction.
Flexible data formats: Supports dense NumPy arrays, SciPy sparse matrices, AnnData/HDF5-backed data, Zarr arrays.

Installation

pip install -U cap-naive-bayes

Usage

Basic Usage

>> from cap_naive_bayes import NaiveBayesModel

>> count_matrix = np.array([
    [2, 1, 0, 0],
    [2, 0, 0, 0],
    [1, 0, 0, 0],
    [1, 0, 1, 1],
])
>> obs = pd.DataFrame({
    'cell_type': ['a', 'a', 'a', 'b'],
})
>> features = pd.Index(['g1', 'g2', 'g3', 'g4'])
>> model = NaiveBayesModel()
>> model.fit(
    X=count_matrix, 
    obs=obs, 
    features=features,
)
>> model  # contains log prior and posterior probabilities
                       g1        g2        g3        g4     prior
labelset  label                                                  
cell_type a     -0.510826 -1.609438 -2.302585 -2.302585 -0.287682
          b     -1.252763 -1.945910 -1.252763 -1.252763 -1.386294

>> pred = model.predict(
    X=count_matrix, 
    labelset="cell_type", 
    features=features,
)
>> pred
  cell_type  cell_type_conf
0         a        0.948776
1         a        0.929726
2         a        0.863014
3         b        0.564414

Chunked Processing

For very large X (e.g. Dask, Zarr, HDF5), pass a chunk size or let the model infer from X.chunks:

# inference of chunk size from .chunks attribute
model.fit(large_zarr_array, obs_df, feature_names, chunk=None)

# explicit chunking
model.predict(X_test, chunk=500)

Feature space allignment

When the feature space of X does not match the model's feature space, you can specify the features to use during prediction:

fs_train = pd.Index(['f1', 'f2', 'f3', 'f4', 'f5'])
X_train = ... # matrix with 5 columns
model.fit(X_train, features=fs_train, ...)
fs_test = pd.Index(['f1','f4','f5', 'f6'])
X_test = ... # matrix with 4 columns
pred = model.predict(X_test, features=fs_test) # valid, model will subsample 'f1', 'f4,, 'f5' from model and x_test.

Multiple labelsets

You can fit the model and make predctions on multiple labelsets by passing a multiple columns in obs DataFrame:

obs = pd.DataFrame({
    'cell_type': ['a', 'a', 'a', 'b'],
    'treatment': ['control', 'control', 'treatment', 'treatment']
})
model.fit(X_train, obs=obs, features=fs_train)
pred = model.predict(X_test, features=fs_test)

License & Acknowledgments

This project is released under the BSD 3-Clause License.
It also incorporates code derived from scikit-learn, which is licensed under the BSD 3‑Clause “New” or “Revised” License.

scikit-learn
Copyright (C) 2007–2024 The scikit-learn developers
BSD 3‑Clause License

Project details

Release history Release notifications | RSS feed

This version

0.1.3

Aug 15, 2025

0.1.2

Jul 30, 2025

0.1.1

Jul 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cap_naive_bayes-0.1.3.tar.gz (41.8 kB view details)

Uploaded Aug 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cap_naive_bayes-0.1.3-py3-none-any.whl (6.9 kB view details)

Uploaded Aug 15, 2025 Python 3

File details

Details for the file cap_naive_bayes-0.1.3.tar.gz.

File metadata

Download URL: cap_naive_bayes-0.1.3.tar.gz
Upload date: Aug 15, 2025
Size: 41.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.0

File hashes

Hashes for cap_naive_bayes-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`4d1145517ca38327684879a2a9e9a3e0d4c517d75f6e6764ed8615238bb7c58b`
MD5	`11b021d7eaa73bb7255302d4ed035e5a`
BLAKE2b-256	`3256ea69dd557b4882d591fa9c3fea077e181fe22fb6e0be0d69f7ccbd17512e`

See more details on using hashes here.

File details

Details for the file cap_naive_bayes-0.1.3-py3-none-any.whl.

File metadata

Download URL: cap_naive_bayes-0.1.3-py3-none-any.whl
Upload date: Aug 15, 2025
Size: 6.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.0

File hashes

Hashes for cap_naive_bayes-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a984d15107af323def10b93afcddc1c88f9f1765c0d5c74572ac8c035ac5a55c`
MD5	`2acfe2a894d2b3bc1e7e45fcc03c2a5c`
BLAKE2b-256	`693be552817be12701df67c1a8b41f3000c36aa2d36726004edf5d5350f22cd2`

See more details on using hashes here.

cap-naive-bayes 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CAP-Naive-Bayes

Main Features:

Installation

Usage

Basic Usage

Chunked Processing

Feature space allignment

Multiple labelsets

License & Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes