Skip to main content

IterPert

Project description

Sequential Optimal Experimental Design of Perturbation Screens

This repository hosts the code base for the paper

Sequential Optimal Experimental Design of Perturbation Screens Guided by Multimodal Priors
Kexin Huang, Romain Lopez, Jan-Christian Hütter, Takamasa Kudo, Antonio Rios, Aviv Regev

Overview

logo

Understanding a cell's transcriptional response to genetic perturbations answers vital biological questions such as cell reprogramming and target discovery. Despite significant advances in the Perturb-seq technology, the demand for vast experimental configurations surpasses the capacity for existing assays. Recent machine learning models, trained on existing Perturb-seq data sets, predict perturbation outcomes but face hurdles due to sub-optimal training set selection, resulting in weak predictions for unexplored perturbation space. In this study, we propose a sequential approach to the design of Perturb-seq experiments that uses the model to strategically select the most informative perturbations at each step, for follow-up experiments. This enables a significantly more efficient exploration of the perturbation space, while predicting the effect of the rest of the perturbations with high-fidelity. We conduct a preliminary data analysis on a large-scale Perturb-seq experiment, which reveals that our setting is severely restricted by the number of examples and rounds, falling into a non-conventional active learning regime called ''active learning under budget''. Motivated by this insight, we develop IterPert that exploits rich and multi-modal prior knowledge in order to efficiently guide the selection of perturbations. Making use of prior knowledge for this task is novel, and crucial for our setting of active learning under budget. We validate our method using in-silico benchmarking of active learning, constructed from a large-scale CRISPRi Perturb-seq data set. Our benchmarking reveals that IterPert outperforms contemporary active learning strategies, and delivering comparable accuracy with only a third of the amount of perturbations profiled. All in all, these results demonstrate the potential of sequentially designing perturbation screens.

Installation

Use the API:

conda create --name iterpert_env python=3.8
conda activate iterpert_env
conda install pyg -c pyg
pip install iterpert

Use the raw source code:

conda create --name iterpert_env python=3.8
conda activate iterpert_env
conda install pyg -c pyg
git clone https://github.com/Genentech/iterative-perturb-seq.git
cd iterative-perturb-seq
pip install -r requirements.txt

API interface

First, initialize the IterPert module:

from iterpert.iterpert import IterPert
interface = IterPert(weight_bias_track = True, 
                     exp_name = strategy,
                     device = 'cuda:0', 
                     seed = 1)

The arguments are:

  • weight_bias_track: True/False, if use weights and bias tracking
  • device: cuda device
  • proj_name: weights and bias project name
  • exp_name: weights and bias experiment name
  • seed: random seed in data split
  • run: random seed in training run

Then, initialize the data:

path = 'YOUR PATH'
interface.initialize_data(path = path,
                          dataset_name='replogle_k562_essential_1000hvg',
                          batch_size = 256)

The arguments are:

  • path: path to save the data
  • dataset_name: name of the dataset
  • batch_size: number of cells in a batch
  • test_fraction: fraction of the hold out test set

Then, initialize the GEARS model:

interface.initialize_model(epochs = 20, hidden_size = 64)

The arguments are:

  • epochs: the number of training epochs
  • hidden_size: the number of hidden size of the model
  • retrain: True/False, whether to retrain the model in each round

Then, initialize the active learning strategy:

You can also choose from baselines Random, BALD, BatchBALD, BAIT, ACS-FW, Core-Set, BADGE, LCMD or specify our method IterPert

interface.initialize_active_learning_strategy(strategy = 'IterPert')

Lastly, kick off the training:

interface.start(n_init_labeled = 100, n_round = 5, n_query = 100)

The arguments are:

  • n_init_labeled: the number of initialized number of samples
  • n_round: the number of rounds
  • n_query: the number of queries per round

Demo

We provide tutorials to get started with iterative perturb-seq:

Name Description
Data Tutorial Introduce the data loader and how to use your own data
Training Tutorial A demo on training iterpert
Knowledge Kernel Tutorial A tutorial on creating knowledge kernel for your own data

Reproduce experiments

Please refer to reproduce_repo directory to reproduce each experiment. Notably, the README.md contains sh files to generate all experiments. figX.ipynb is the notebook that produces the figures.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iterpert-0.0.2.tar.gz (80.0 kB view details)

Uploaded Source

File details

Details for the file iterpert-0.0.2.tar.gz.

File metadata

  • Download URL: iterpert-0.0.2.tar.gz
  • Upload date:
  • Size: 80.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for iterpert-0.0.2.tar.gz
Algorithm Hash digest
SHA256 7decfeee5d57a959cecd18ee5091daea81b8758446917308498d471ee7aaf2de
MD5 49bb8612aa2519cbb4fbc831f9965530
BLAKE2b-256 4e997d5b2dc3826f8b3bc33d6cd4a6fcffbe3aaa2c406e9ba8d164657a00b3d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page