Skip to main content

Evaluating single-cell data integration methods

Project description

Stars PyPI PyPIDownloads Build Status Documentation codecov pre-commit

Benchmarking atlas-level data integration in single-cell genomics

This repository contains the code for the scib package used in our benchmarking study for data integration tools. In our study, we benchmark 16 methods (see Tools) with 4 combinations of preprocessing steps leading to 68 methods combinations on 85 batches of gene expression and chromatin accessibility data.

Workflow

Resources

  • The git repository of the scib package and its documentation.
  • The reusable pipeline we used in the study can be found in the separate scib pipeline repository. It is reproducible and automates the computation of preprocesssing combinations, integration methods and benchmarking metrics.
  • On our website we visualise the results of the study.
  • For reproducibility and visualisation we have a dedicated repository: scib-reproducibility.

Please cite:

Luecken, M.D., Büttner, M., Chaichoompu, K. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2022). https://doi.org/10.1038/s41592-021-01336-8

Package: scib

We created the python package called scib that uses scanpy to streamline the integration of single-cell datasets and evaluate the results. The package contains several modules for preprocessing an anndata object, running integration methods and evaluating the resulting using a number of metrics. For preprocessing, scib.preprocessing (or scib.pp) contains functions for normalising, scaling or batch-aware selection of highly variable genes. Functions for the integration methods are in scib.integration or for short scib.ig and metrics are under scib.metrics (or scib.me).

The scib python package is available on PyPI and can be installed through

pip install scib

Import scib in python:

import scib

Optional Dependencies

The package contains optional dependencies that need to be installed manually if needed. These include R dependencies (rpy2, anndata2ri) which require an installation of R integration method packages. All optional dependencies are listed under setup.cfg under [options.extras_require] and can be installed through pip.

e.g. for installing rpy2 and bbknn dependencies:

pip install 'scib[rpy2,bbknn]'

Optional dependencies outside of python need to be installed separately. For instance, in order to run kBET, install it via the following command in R:

install.packages('remotes')
remotes::install_github('theislab/kBET')

Metrics

We implemented different metrics for evaluating batch correction and biological conservation in the scib.metrics module.

Biological Conservation

Batch Correction

  • Cell type ASW

  • Cell cycle conservation

  • Graph cLISI

  • Adjusted rand index (ARI) for cell label

  • Normalised mutual information (NMI) for cell label

  • Highly variable gene conservation

  • Isolated label ASW

  • Isolated label F1

  • Trajectory conservation

  • Batch ASW

  • Principal component regression

  • Graph iLISI

  • Graph connectivity

  • kBET (K-nearest neighbour batch effect)

For a detailed description of the metrics implemented in this package, please see our publication and the package documentation.

Integration Tools

Tools that are compared include:

Development

For developing this package, please make sure to install additional dependencies so that you can use pytest and pre-commit.

pip install -e '.[test,dev]'

Please refer to the setup.cfg for more optional dependencies.

Install pre-commit to the repository for running it automatically every time you commit in git.

pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scib-1.1.5.tar.gz (74.4 kB view details)

Uploaded Source

Built Distribution

scib-1.1.5-1-py3-none-any.whl (79.5 kB view details)

Uploaded Python 3

File details

Details for the file scib-1.1.5.tar.gz.

File metadata

  • Download URL: scib-1.1.5.tar.gz
  • Upload date:
  • Size: 74.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for scib-1.1.5.tar.gz
Algorithm Hash digest
SHA256 7ab3183065f2d861b64f88823a55cec767327a37ad6d0eaccce7b43c996293ad
MD5 3bcff3b4483f3a64b2dd261c5651d71c
BLAKE2b-256 071e74d194a4597bc6c3adae7e286a75fa102bf4e4b2094439df2cf01c77ba76

See more details on using hashes here.

File details

Details for the file scib-1.1.5-1-py3-none-any.whl.

File metadata

  • Download URL: scib-1.1.5-1-py3-none-any.whl
  • Upload date:
  • Size: 79.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for scib-1.1.5-1-py3-none-any.whl
Algorithm Hash digest
SHA256 e5aec8037bb001a5f1b920ea81bac5288758e6013b2f89f1fbc3dcbd1e6c4e47
MD5 5c44b688794b6b20f4fc50bdc8cdc070
BLAKE2b-256 c5f44a27b5bec99be3f24a0634761deba4cd336962c105b70888294762fb3bf0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page