Skip to main content

Method to detect and enable removal of doublets from single-cell RNA-sequencing.

Project description

DoubletDetection

DOI Documentation Status Code style: black Build Status

DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices.

Installing DoubletDetection

Install from PyPI

pip install doubletdetection

Install from source

git clone https://github.com/JonathanShor/DoubletDetection.git
cd DoubletDetection
pip3 install .

If you are using pipenv as your virtual environment, it may struggle installing from the setup.py due to our custom Phenograph requirement. If so, try the following in the cloned repo:

pipenv run pip3 install .

Running DoubletDetection

To run basic doublet classification:

import doubletdetection
clf = doubletdetection.BoostClassifier()
# raw_counts is a cells by genes count matrix
labels = clf.fit(raw_counts).predict()
# higher means more likely to be doublet
scores = clf.doublet_score()
  • raw_counts is a scRNA-seq count matrix (cells by genes), and is array-like
  • labels is a 1-dimensional numpy ndarray with the value 1 representing a detected doublet, 0 a singlet, and np.nan an ambiguous cell.
  • scores is a 1-dimensional numpy ndarray representing a score for how likely a cell is to be a doublet. The score is used to create the labels.

The classifier works best when

  • There are several cell types present in the data
  • It is applied individually to each run in an aggregated count matrix

In v2.5 we have added a new experimental clustering method (scanpy's Louvain clustering) that is much faster than phenograph. We are still validating results from this new clustering. Please see the notebook below for an example of using this new feature.

Tutorial

See our tutorial for an example on 10k PBMCs from 10x Genomics.

Obtaining data

Data can be downloaded from the 10x website.

Credits and citations

Gayoso, Adam, Shor, Jonathan, Carr, Ambrose J., Sharma, Roshan, Pe'er, Dana (2020, December 18). DoubletDetection (Version v3.0). Zenodo. http://doi.org/10.5281/zenodo.2678041

We also thank the participants of the 1st Human Cell Atlas Jamboree, Chun J. Ye for providing data useful in developing this method, and Itsik Pe'er for providing guidance in early development as part of the Computational genomics class at Columbia University.

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doubletdetection-4.2.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

doubletdetection-4.2-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file doubletdetection-4.2.tar.gz.

File metadata

  • Download URL: doubletdetection-4.2.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.7 Linux/5.13.0-30-generic

File hashes

Hashes for doubletdetection-4.2.tar.gz
Algorithm Hash digest
SHA256 77273d543a7c9b4f4e795b7b664c28bce0613b3a3a7c7f5137974012400a0a6c
MD5 83d514063e9cadbef54226ab2394c4a4
BLAKE2b-256 930079329038a87f2b55259be94e17e6e346bdab90e315789242f730ce5fe550

See more details on using hashes here.

File details

Details for the file doubletdetection-4.2-py3-none-any.whl.

File metadata

  • Download URL: doubletdetection-4.2-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.7 Linux/5.13.0-30-generic

File hashes

Hashes for doubletdetection-4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0940273e33445fc2e55278508a935adff9d5fbc0b9ffca9b4a969879abb8eb1c
MD5 1a6f00672f6a8a40cb74b01f2179017a
BLAKE2b-256 ece22563a0a4d57500be00ff876cd2c54ee57d845b7b7c8c76a3f6a89ba119ea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page