Skip to main content

A lightweight implementation of a multinomial Naive Bayes classifier for Annotation Transfer of single-cell data.

Project description

CAP-Naive-Bayes

PyPI version Build Status

A lightweight, extensible implementation of a multinomial Naive Bayes classifier in pure Python. It is designed for Annotation Transfer of single-cell data, allowing you to fit and predict on large datasets efficiently using out-of-core chunked processing.

Main Features:

  • Out-of-core chunked processing: Efficiently handle large datasets without loading everything into memory.
  • Support for missing features: Can handle datasets where some features are missing during prediction.
  • Flexible data formats: Supports dense NumPy arrays, SciPy sparse matrices, AnnData/HDF5-backed data, Zarr arrays.

Installation

pip install -U cap-naive-bayes

Usage

Basic Usage

>> from cap_naive_bayes import NaiveBayesModel

>> count_matrix = np.array([
    [2, 1, 0, 0],
    [2, 0, 0, 0],
    [1, 0, 0, 0],
    [1, 0, 1, 1],
])
>> obs = pd.DataFrame({
    'cell_type': ['a', 'a', 'a', 'b'],
})
>> features = pd.Index(['g1', 'g2', 'g3', 'g4'])
>> model = NaiveBayesModel()
>> model.fit(
    X=count_matrix, 
    obs=obs, 
    features=features,
)
>> model  # contains log prior and posterior probabilities
                       g1        g2        g3        g4     prior
labelset  label                                                  
cell_type a     -0.510826 -1.609438 -2.302585 -2.302585 -0.287682
          b     -1.252763 -1.945910 -1.252763 -1.252763 -1.386294

>> pred = model.predict(
    X=count_matrix, 
    labelset="cell_type", 
    features=features,
)
>> pred
  cell_type  cell_type_conf
0         a        0.948776
1         a        0.929726
2         a        0.863014
3         b        0.564414

Chunked Processing

For very large X (e.g. Dask, Zarr, HDF5), pass a chunk size or let the model infer from X.chunks:

# inference of chunk size from .chunks attribute
model.fit(large_zarr_array, obs_df, feature_names, chunk=None)

# explicit chunking
model.predict(X_test, chunk=500)

Feature space allignment

When the feature space of X does not match the model's feature space, you can specify the features to use during prediction:

fs_train = pd.Index(['f1', 'f2', 'f3', 'f4', 'f5'])
X_train = ... # matrix with 5 columns
model.fit(X_train, features=fs_train, ...)
fs_test = pd.Index(['f1','f4','f5', 'f6'])
X_test = ... # matrix with 4 columns
pred = model.predict(X_test, features=fs_test) # valid, model will subsample 'f1', 'f4,, 'f5' from model and x_test. 

Multiple labelsets

You can fit the model and make predctions on multiple labelsets by passing a multiple columns in obs DataFrame:

obs = pd.DataFrame({
    'cell_type': ['a', 'a', 'a', 'b'],
    'treatment': ['control', 'control', 'treatment', 'treatment']
})
model.fit(X_train, obs=obs, features=fs_train)
pred = model.predict(X_test, features=fs_test)

License & Acknowledgments

This project is released under the BSD 3-Clause License.
It also incorporates code derived from scikit-learn, which is licensed under the BSD 3‑Clause “New” or “Revised” License.

  • scikit-learn
    Copyright (C) 2007–2024 The scikit-learn developers
    BSD 3‑Clause License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cap_naive_bayes-0.1.3.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cap_naive_bayes-0.1.3-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file cap_naive_bayes-0.1.3.tar.gz.

File metadata

  • Download URL: cap_naive_bayes-0.1.3.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for cap_naive_bayes-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4d1145517ca38327684879a2a9e9a3e0d4c517d75f6e6764ed8615238bb7c58b
MD5 11b021d7eaa73bb7255302d4ed035e5a
BLAKE2b-256 3256ea69dd557b4882d591fa9c3fea077e181fe22fb6e0be0d69f7ccbd17512e

See more details on using hashes here.

File details

Details for the file cap_naive_bayes-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for cap_naive_bayes-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a984d15107af323def10b93afcddc1c88f9f1765c0d5c74572ac8c035ac5a55c
MD5 2acfe2a894d2b3bc1e7e45fcc03c2a5c
BLAKE2b-256 693be552817be12701df67c1a8b41f3000c36aa2d36726004edf5d5350f22cd2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page