Skip to main content

Method to detect and enable removal of doublets from single-cell RNA-sequencing.

Project description

DoubletDetection

DOI Documentation Status Code style: black Build Status

DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices.

Installing DoubletDetection

Install from PyPI

pip install doubletdetection

Install from source

git clone https://github.com/JonathanShor/DoubletDetection.git
cd DoubletDetection
pip3 install .

If you are using pipenv as your virtual environment, it may struggle installing from the setup.py due to our custom Phenograph requirement. If so, try the following in the cloned repo:

pipenv run pip3 install .

Running DoubletDetection

To run basic doublet classification:

import doubletdetection
clf = doubletdetection.BoostClassifier()
# raw_counts is a cells by genes count matrix
labels = clf.fit(raw_counts).predict()
  • raw_counts is a scRNA-seq count matrix (cells by genes), and is array-like
  • labels is a 1-dimensional numpy ndarray with the value 1 representing a detected doublet, 0 a singlet, and np.nan an ambiguous cell.

The classifier works best when

  • There are several cell types present in the data
  • It is applied individually to each run in an aggregated count matrix

In v2.5 we have added a new experimental clustering method (scanpy's Louvain clustering) that is much faster than phenograph. We are still validating results from this new clustering. Please see the notebook below for an example of using this new feature.

See our jupyter notebook for an example on 8k PBMCs from 10x.

Obtaining data

Data can be downloaded from the 10x website.

Credits and citations

Gayoso, Adam, Shor, Jonathan, Carr, Ambrose J., Sharma, Roshan, Pe'er, Dana (2018, July 17). DoubletDetection (Version v2.4). Zenodo. http://doi.org/10.5281/zenodo.2678041

We also thank the participants of the 1st Human Cell Atlas Jamboree, Chun J. Ye for providing data useful in developing this method, and Itsik Pe'er for providing guidance in early development as part of the Computational genomics class at Columbia University.

This project is licensed under the terms of the MIT license.

Project details


Release history Release notifications | RSS feed

This version

3.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for doubletdetection, version 3.0
Filename, size File type Python version Upload date Hashes
Filename, size doubletdetection-3.0-py3-none-any.whl (10.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size doubletdetection-3.0.tar.gz (11.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page