Skip to main content

Evaluating single-cell data integration methods

Project description

Benchmarking atlas-level data integration in single-cell genomics

This repository contains the code for our benchmarking study for data integration tools. In our study, we benchmark 16 methods (see here) with 4 combinations of preprocessing steps leading to 68 methods combinations on 85 batches of gene expression and chromatin accessibility data.

Workflow

Resources

  • On our website we visualise the results of the study.

  • The reusable pipeline we used in the study can be found in the separate scib pipeline repository. It is reproducible and automates the computation of preprocesssing combinations, integration methods and benchmarking metrics.

  • For reproducibility and visualisation we have a dedicated repository: scib-reproducibility.

Please cite:

Benchmarking atlas-level data integration in single-cell genomics. MD Luecken, M Büttner, K Chaichoompu, A Danese, M Interlandi, MF Mueller, DC Strobl, L Zappia, M Dugas, M Colomé-Tatché, FJ Theis bioRxiv 2020.05.22.111161; doi: https://doi.org/10.1101/2020.05.22.111161_

Package: scib

We created the python package called scib that uses scanpy to streamline the integration of single-cell datasets and evaluate the results. For evaluating the integration quality it provides a number of metrics.

Requirements

  • Linux or UNIX system
  • Python >= 3.7
  • 3.6 <= R <= 4.0

We recommend working with environments such as Conda or virtualenv, so that python and R dependencies are in one place. Please also check out scib pipeline for ready-to-use environments. Alternatively, manually install the package on your system using pip, described in the next section.

Installation

The scib python package is in the folder scib. You can simply install it from the root of this repository using

pip install .

Alternatively, you can also install the package directly from GitHub via

pip install git+https://github.com/theislab/scib.git

Additionally, in order to run the R package kBET, you need to install it through R.

devtools::install_github('theislab/kBET')

Note: By default dependencies for integration methods are not installed due to dependency clashes. In order to use integration methods, see the next section

Installing additional packages

This package contains code for running integration methods as well as for evaluating their output. However, due to dependency clashes, scib is only installed with the packages needed for the metrics. In order to use the integration wrapper functions, we recommend to work with different environments for different methods, each with their own installation of scib. You can install optional Python dependencies via pip as follows:

pip install .[bbknn]  # using BBKNN
pip install .[scanorama]  # using Scanorama
pip install .[bbknn,scanorama]  # Multiple methods in one go

The setup.cfg for a full list of Python dependencies. For a comprehensive list of supported integration methods, including R packages, check out the Tools.

Usage

The package contains several modules for the different steps of the integration and benchmarking pipeline. Functions for the integration methods are in scib.integration or for short scib.ig. The methods can be called using

scib.integration.<method>(adata, batch=<batch_key>)

where <method> is the name of the integration method and <batch_key> is the name of the batch column in adata.obs. For example, in order to run Scanorama, on a dataset with batch key 'batch' call

scib.integration.scanorama(adata, batch='batch')

Warning: the following notation is deprecated.

scib.integration.run<method>(adata, batch=<batch_key>)

Please use the snake case naming without the run prefix.

Some integration methods (scgen, scanvi) also use cell type labels as input. For these, you need to additionally provide the corresponding label column.

scgen(adata, batch=<batch_key>, cell_type=<cell_type>)
scanvi(adata, batch=<batch_key>, labels=<cell_type>)

scib.preprocessing (or scib.pp) contains functions for normalising, scaling or selecting highly variable genes per batch The metrics are under scib.metrics (or scib.me).

Metrics

For a detailed description of the metrics implemented in this package, please see the manuscript.

Batch removal metrics include:

  • Principal component regression pcr_comparison()
  • Batch ASW silhouette()
  • K-nearest neighbour batch effect kBET()
  • Graph connectivity graph_connectivity()
  • Graph iLISI lisi_graph()

Biological conservation metrics include:

  • Normalised mutual information nmi()
  • Adjusted Rand Index ari()
  • Cell type ASW silhouette_batch()
  • Isolated label score F1 isolated_labels()
  • Isolated label score ASW isolated_labels()
  • Cell cycle conservation cell_cycle()
  • Highly variable gene conservation hvg_overlap()
  • Trajectory conservation trajectory_conservation()
  • Graph cLISI lisi_graph()

Metrics Wrapper Functions

We provide wrapper functions to run multiple metrics in one function call. The scib.metrics.metrics() function returns a pandas.Dataframe of all metrics specified as parameters.

scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)

Furthermore, scib.metrics.metrics() is wrapped by convenience functions that only select certain metrics:

  • scib.me.metrics_fast() only computes metrics that require little preprocessing
  • scib.me.metrics_slim() includes all functions of scib.me.metrics_fast() and adds clustering-based metrics
  • scib.me.metrics_all() includes all metrics

Tools

Tools that are compared include:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scib-1.0.0.tar.gz (63.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scib-1.0.0-1-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file scib-1.0.0.tar.gz.

File metadata

  • Download URL: scib-1.0.0.tar.gz
  • Upload date:
  • Size: 63.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for scib-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d8fa3596433fcfd7ef5cc57022f77ee7eed0dafb37eaa87800cd190b12604a4d
MD5 400ab889fad2b21db2c56d2442cc7e9e
BLAKE2b-256 22b8e350757bc55516a7fbd7fde61d9f8b707ddf0f5cd2ed12f63387817eedce

See more details on using hashes here.

File details

Details for the file scib-1.0.0-1-py3-none-any.whl.

File metadata

  • Download URL: scib-1.0.0-1-py3-none-any.whl
  • Upload date:
  • Size: 67.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for scib-1.0.0-1-py3-none-any.whl
Algorithm Hash digest
SHA256 c34bea741f4183e34a7e8811043a50bac76ead44569ac33f65d46ed42b06c77c
MD5 ac84d9a3086783b4bc20ad083f7adead
BLAKE2b-256 109cda43e1b07575cc7e885a0704e7e580d1603765fc633d8f5d8a9f5928c945

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page