Skip to main content

Scalable, Interpretable Deep Learning for Single-Cell RNA-seq Classification

Project description

SIMS: Scalable, Interpretable Modeling for Single-Cell RNA-Seq Data Classification

SIMS is a pipeline for building interpretable and accurate classifiers for identifying any target on single-cell rna-seq data. The SIMS model is based on a sequential transformer, a transformer model specifically built for large-scale tabular datasets.

SIMS takes in a list of arbitrarily many expression matrices along with their corresponding target variables. We assume the matrix form cell x gene, and NOT gene x cell, since our training samples are the transcriptomes of individual cells.

The code is run with python. To use the package, we recommend using a virtual environment such as miniconda which will allow you to install packages without harming your system python.

Installation

If using conda, run

  1. Create a new virtual environment with conda create --name=<NAME> python=3.9
  2. Enter into your virtual environment with conda activate NAME

Otherwise, enter your virtual environment of choice and

  1. Install the SIMS package with pip install --use-pep517 git+https://github.com/braingeneers/SIMS.git
  2. Set up the model training code in a MYFILE.py file, and run it with python MYFILE.py. A tutorial on how to set up training code is shown below.

Training and inference

To train a model, we can set up a SIMS class in the following way:

from scsims import SIMS
from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(offline=True)

data = an.read_h5ad('mydata.h5ad')
sims = SIMS(data=data, class_label='class_label')
sims.setup_trainer(accelerator="gpu", devices=1, logger=logger)
sims.train()

This will set up the underlying dataloaders, model, model checkpointing, and everything else we need. Model checkpoints will be saved every training epoch.

To load in a model to infer new cell types on an unlabeled dataset, we load in the model checkpoint, point to the label file that we originally trained on, and run the predict method on new data.

sims = SIMS(weights_path='myawesomemodel.ckpt')
cell_predictions = sims.predict('my/new/unlabeled.h5ad')

Finally, to look at the explainability of the model, we similarly run

explainability_matrix = sims.explain('my/new/unlabeled.h5ad') # this can also be labeled data, of course 

Custom training jobs / logging

To customize the underlying pl.Trainer and SIMS model params, we can initialize the SIMS model like

from pytorch_lightning.loggers import WandbLogger
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
from scsims import SIMS

wandb_logger = WandbLogger(project=f"My Project", name=f"SIMS Model Training") # set up the logger to log data to Weights and Biases

sims = SIMS(data=adata, class_label='class_label')
sims.setup_model(n_a=64, n_d=64, weights=sims.weights)  # weighting loss inversely proportional by label freq, helps learn rare cell types (recommended)
sims.setup_trainer(
    logger=wandb_logger,
    callbacks=[
        EarlyStopping(
            monitor="val_loss",
            patience=50,
        ),
        LearningRateMonitor(logging_interval="epoch"),
    ],
    num_epochs=100,
)
sims.train()

This will train the SIMS model on the given expression matrices with target variable given by the class_label column in each label file.

Using SIMS inside github codespaces

If you are using SIMS only for predictions using an already trained model, github codespaces is the recommended way to use this tool. You can also use this pipeline to train it in smaller datasets as the computing services offered in codespaces are modest. To use this tool in github codespaces start by forking the repo in your github account. Then create a new codespace with the SIMS repo as the Repository of choice. Once inside the newly created environment pull the latest SIMS image:

docker pull jmlehrer/sims:latest

Run the docker container mounting the file folder containing datasets and model checkpoints to the filesystem:

docker run -it -v /path/to/local/folder:/path/in/container [image_name] /bin/bash

Run main.py to check if the installation has been completed. You can alter this file as shown above to perform the different tasks.

python main.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scsims-3.0.6.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

scsims-3.0.6-py2.py3-none-any.whl (22.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scsims-3.0.6.tar.gz.

File metadata

  • Download URL: scsims-3.0.6.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.9.5

File hashes

Hashes for scsims-3.0.6.tar.gz
Algorithm Hash digest
SHA256 b114dc4a0feaaf4360642aa99712fe490d82ef9333d6fa7333f1815da92ea399
MD5 e0e28c04fc895425d000488c5adfe3eb
BLAKE2b-256 0c80acd26086a7494968f77515f955c349687b7c3e1ee33c04256de94bf9addb

See more details on using hashes here.

File details

Details for the file scsims-3.0.6-py2.py3-none-any.whl.

File metadata

  • Download URL: scsims-3.0.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.9.5

File hashes

Hashes for scsims-3.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c752bd9a914449f32f497691dbbadcea371741bc61b742b345b8fefa0cb3506f
MD5 48674a6e86c312419f7f5ffdf39ade8a
BLAKE2b-256 e68d3c6434f33a89424c6962d5039f3a0634631c021c02dd0f13e9bea6bef3ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page