Skip to main content

A multiplet removal tool for processing cell hashing data

Project description

GMM-Demux

A Gaussian Mixture Model based software for processing cell hashing data.

Below shows an example classification result. Orange dots are multi-sample multiplets.

GMM-Demux example

Description

GMM-Demux removes Multi-Sample-Multiplets (MSMs) in a cell hashing dataset and estimates the fraction of Same-Sample-Multiplets (SSMs) and singlets in the remaining dataset.

Features

  • Remove cell-hashing-identifiable multiplets from the dataset.
  • Estimate the fraction of cell-hashing-unidentifiable multiplets in the remaining dataset (the RSSM value).

Example Dataset

  • An example cell hashing data is provided in the example_input folder. It contains the per drop HTO count matrix of a 4-sample cell hashing library prep.

Authors

Hongyi Xin, Qi Yan, Yale Jiang, Jiadi Luo, Carla Erb, Richard Duerr, Kong Chen* and Wei Chen*

Maintainer

Hongyi Xin xhongyi@pitt.edu

Requirement

GMM-Demux requires python3 (>3.5).

Install

GMM-Demux can be directly installed from PyPi. Or it can be built and installed locally.

  • Install GMM-Demux from PyPi.
pip3 install --user GMM_Demux

If choose to install from PyPi, it is unnecessary to download GMM-Demux from github. However, we still recommend downloading the example dataset to try out GMM-Demux.

  • Install GMM-Demux locally using setuptools and pip3.
cd <GMM-Demux dir>
python3 setup.py sdist bdist_wheel
pip3 install --user . 
  • Post installation processes

If this is the first time you install a python3 software through pip, make sure you add the pip binary folder to your PATH variable. Typically, the pip binary folder is located at ~/.local/bin.

To temporarily add the pip binary folder, run the following command:

export PATH=~/.local/bin:$PATH

To permenantly add the pip library folder to your PATH variable, append the following line to your .bashrc file.

PATH=~/.local/bin:$PATH

Content

The source code of GMM-Demux is supplied in the GMM_Demux folder.

An example cell hashing dataset is also provided, located in the example_input/outs/filtered_feature_bc_matrix folder.

Usage

Once installed, the github folder is no longer needed. Instead, GMM-Demux is directly accessible with the GMM-demux command.

GMM-demux <cell_hashing_path> <HTO_names> <estimated_cell_num>

<HTO_names> is a list of strings separated by ',' without whitespace. For example, there are four HTO tags in the example cell hashing dataset supplied in this repository. They are HTO_1, HTO_2, HTO_3, HTO_4. The <HTO_names> variable therefore is HTO_1,HTO_2,_HTO_3,HTO_4.

MSM-free droplets are stored in folder GMM_Demux_mtx under the current directory by default. The output path can also be specified through the -o flag.

Example Usage

An example cell hashing data is provided in example_input. <HTO_names> can be obtained from the features.tsv file.

GMM-demux example_input/outs/filtered_feature_bc_matrix HTO_1,HTO_2,HTO_3,HTO_4 35685

<HTO_names> are obtained from the features.tsv file. The feature.tsv file of the example cell hashing dataset is shown below.

HTO names example

Optional Arguments

  • -h: show help information.
  • -f FULL, --full FULL Generate the full classification report. Require a path argument.
  • -s SIMPLIFIED, --simplified SIMPLIFIED Generate the simplified classification report. Require a path argument.
  • -o OUTPUT, --output OUTPUT Specify the folder to store the result. Require a path argument.
  • -r REPORT, --report REPORT Specify the file to store summary report. Require a file argument.

Output Values

  • CellRanger MSM-free drops, in MTX format. Compatible with CellRanger 3.0.
  • Dataset summary. An example summary is shown below. Summary example

Output Explanation

  • MSM denotes the percentage of identified and removed multiplets among all droplets.
  • SSM denotes the percentage of unidentifiable multiplets among all droplets.
  • RSSM denotes the percentage of multiplets among the output droplets (after removing identifiable multiplets). RSSM measures the quality of the cell hashing dataset.

Online Cell Hashing Experiment Planner

A GMM-Demux based online cell hashing experiment planner is publically accessible at here.

Online explanner example

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GMM_Demux-0.0.5.3.tar.gz (11.4 kB view hashes)

Uploaded Source

Built Distribution

GMM_Demux-0.0.5.3-py3-none-any.whl (14.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page