Scalable, Interpretable Deep Learning for Single-Cell RNA-seq Classification
Project description
SIMS: Scalable, Interpretable Modeling for Single-Cell RNA-Seq Data Classification
SIMS is a pipeline for building interpretable and accurate classifiers for intentifying any target on single-cell rna-seq data. The SIMS model is based on TabNet, a self-attention based model specifically built for large-scale tabular datasets.
SIMS takes in a list of arbitrarily many expression matrices along with their corresponding target variables. The expression matrices may be AnnData objects with format h5ad
, or .csv
.
They must be in the matrix form cell x gene
, and NOT gene x cell
, since our training samples are the transcriptomes of individual cells.
The data is formated like so:
- All matrices are cell x expression
- All label files contain a common column, known as the
class_label
, on which to train the model datafiles
andlabelfiles
are the absolute paths to the expression matrices and labels, respectively
A call to generate and train the SIMS model looks like the following:
import torch
from scsims import generate_trainer
trainer, model, data = generate_trainer(
datafiles=['cortical_cells.csv', 'cortical_cells_2.csv', 'external/cortical_cells_3.h5ad'], # Notice we can mix and match file types
labelfiles=['l1.csv', 'l2.csv', 'l3.csv'],
class_label='cell_state', # Train to predict cell state!
batch_size=4,
optim_params = {
'optimizer': torch.optim.Adam,
'lr': lr,
'weight_decay': weight_decay,
},
)
trainer.fit(model, datamodule=data)
This will train a derivation of the TabNet model on the given expression matrices with target variable given by the class_label
column in each label file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scsims-2.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c25511c66c99ae4604d58021dc99474c2e3bf652984a6aff1698e84c1c376c4 |
|
MD5 | 64406a93943e4906494eecec5f871fc2 |
|
BLAKE2b-256 | be25f91d111eba0a11f21c98cb4fd3bcff522f99d567db114dac4aa6c6b64ace |