A multiplet removal tool for processing cell hashing data
Project description
GMM-Demux
A Gaussian Mixture Model based software for processing cell hashing data.
Below shows an example classification result. Orange dots are multi-sample multiplets.
Description
GMM-Demux removes Multi-Sample-Multiplets (MSMs) in a cell hashing dataset and estimates the fraction of Same-Sample-Multiplets (SSMs) and singlets in the remaining dataset.
Features
- Remove cell-hashing-identifiable multiplets from the dataset.
- Estimate the fraction of cell-hashing-unidentifiable multiplets in the remaining dataset (the RSSM value).
Example Dataset
- An example cell hashing data is provided in the example_input folder. It contains the per drop HTO count matrix of a 4-sample cell hashing library prep.
Authors
Hongyi Xin, Qi Yan, Yale Jiang, Jiadi Luo, Carla Erb, Richard Duerr, Kong Chen* and Wei Chen*
Maintainer
Hongyi Xin xhongyi@pitt.edu
Requirement
GMM-Demux requires python3 (>3.5).
Install
GMM-Demux can be directly installed from PyPi. Or it can be built and installed locally.
- Install GMM-Demux from PyPi.
pip3 install --user GMM_Demux
If choose to install from PyPi, it is unnecessary to download GMM-Demux from github. However, we still recommend downloading the example dataset to try out GMM-Demux.
- Install GMM-Demux locally using setuptools and pip3.
cd <GMM-Demux dir>
python3 setup.py sdist bdist_wheel
pip3 install --user .
- Post installation processes
If this is the first time you install a python3 software through pip, make sure you add the pip binary folder to your PATH
variable.
Typically, the pip binary folder is located at ~/.local/bin
.
To temporarily add the pip binary folder, run the following command:
export PATH=~/.local/bin:$PATH
To permenantly add the pip library folder to your PATH
variable, append the following line to your .bashrc
file.
PATH=~/.local/bin:$PATH
Content
The source code of GMM-Demux is supplied in the GMM_Demux
folder.
An example cell hashing dataset is also provided, located in the example_input/outs/filtered_feature_bc_matrix
folder.
Usage
Once installed, the github folder is no longer needed. Instead, GMM-Demux is directly accessible with the GMM-demux
command.
GMM-demux <cell_hashing_path> <HTO_names> <estimated_cell_num>
<HTO_names>
is a list of strings separated by ',' without whitespace.
For example, there are four HTO tags in the example cell hashing dataset supplied in this repository.
They are HTO_1, HTO_2, HTO_3, HTO_4. The <HTO_names>
variable therefore is HTO_1,HTO_2,_HTO_3,HTO_4
.
MSM-free droplets are stored in folder GMM_Demux_mtx under the current directory by default.
The output path can also be specified through the -o
flag.
Example Usage
An example cell hashing data is provided in example_input. <HTO_names> can be obtained from the features.tsv file.
GMM-demux example_input/outs/filtered_feature_bc_matrix HTO_1,HTO_2,HTO_3,HTO_4 35685
<HTO_names> are obtained from the features.tsv file. The feature.tsv file of the example cell hashing dataset is shown below.
Optional Arguments
- -h: show help information.
- -f FULL, --full FULL Generate the full classification report. Require a path argument.
- -s SIMPLIFIED, --simplified SIMPLIFIED Generate the simplified classification report. Require a path argument.
- -o OUTPUT, --output OUTPUT Specify the folder to store the result. Require a path argument.
- -r REPORT, --report REPORT Specify the file to store summary report. Require a file argument.
Output Values
- CellRanger MSM-free drops, in MTX format. Compatible with CellRanger 3.0.
- Dataset summary. An example summary is shown below.
Output Explanation
- MSM denotes the percentage of identified and removed multiplets among all droplets.
- SSM denotes the percentage of unidentifiable multiplets among all droplets.
- RSSM denotes the percentage of multiplets among the output droplets (after removing identifiable multiplets). RSSM measures the quality of the cell hashing dataset.
Online Cell Hashing Experiment Planner
A GMM-Demux based online cell hashing experiment planner is publically accessible at here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for GMM_Demux-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d0a4ae646cd8f8bec9a7e6889cbd1f6f6fed7ff5351614c50a0eab95bb17729 |
|
MD5 | cc7fc5894b030997c5f49f77b74f490f |
|
BLAKE2b-256 | a7b1203852b80d7630fcfe480a6c3db2e0e445bae814573c9cf1a56428784b6b |