A small library for taking the transpose of arbitrarily large .csvs
Project description
SIMS: Scalable, Interpretable Modeling for Single-Cell RNA-Seq Data Classification
SIMS is a pipeline for building interpretable and accurate classifiers for intentifying any target on single-cell rna-seq data. The SIMS model is based on TabNet, a self-attention based model specifically built for large-scale tabular datasets.
SIMS takes in a list of arbitrarily many expression matrices along with their corresponding target variables. The expression matrices may be AnnData objects with format h5ad
, or .csv
.
They must be in the matrix form cell x gene
, and NOT gene x cell
, since our training samples are the transcriptomes of individual cells.
The data is formated like so:
- All matrices are cell x expression
- All label files contain a common column, known as the
class_label
, on which to train the model datafiles
andlabelfiles
are the absolute paths to the expression matrices and labels, respectively
A call to generate and train the SIMS model looks like the following:
from src.models.lib.lightning_train import generate_trainer
trainer, model, data = generate_trainer(
datafiles=['cortical_cells.csv', 'cortical_cells_2.csv', 'external/cortical_cells_3.h5ad'], # Notice we can mix and match file types
labelfiles=['l1.csv', 'l2.csv', 'l3.csv'],
class_label='cell_state', # Train to predict cell state!
batch_size=4,
)
trainer.fit(model, datamodule=data)
This will train a derivation of the TabNet model on the given expression matrices with target variable given by the class_label
column in each label file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scsims-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eab91f20b0499ffd7fa2af98310210f698ccc7d46d8146bbf0660cf2012049e |
|
MD5 | ce96ec54f9eb7a96132788e842d5d096 |
|
BLAKE2b-256 | dfb2de453e6445cb720c4b258135f42bb966315e66a11522e174c50fb1333fe9 |