Pre-print version of the GMMchi-based single cell postprocessing pipeline
Project description
GMMchi_scs_pipeline
GMMchi is the python package for postprocessing single cell RNA-seq data specifically on mixtures of similar cell type. GMMchi_scs_pipeline leverages GMMchi to remove cluster and visulization bias due to highly varying factors such as library size, library complexity as well as highly expressed genes including housekeeping genes, mitochondrial and ribosomal gene.
Mixtures of similar cell types can be extremely challenging in seperating during clustering and visualization due to the high degree of homology between the expression profile across each cell. In addition, there failure in removing confounding factors may lead to serious misinterpretation in the data. GMMchi_scs_pipeline presents the first step in attempting to achieve a better seperation in mixtures of similar cell types. The package contains the necessary tools for postprocessing, dimensionality-reduction as well as visualization. The output returned by the postprocessing function of this package also provides a simple format if the user wishes to use other downstream visualization or clustering techinques.
Part of this package is the leverage of GMMchi, another pipeline we built that can be downloaded and found here.
Download Package
Download the GMMchi_scs_pipeline package by:
pip install git+https://github.com/jeffliu6068/GMMchi_scs_pipeline.git
or
pip install GMMchi_scs_pipeline
Import
Once installed, import the package by:
import GMMchi_scs_pipeline
Intuition: How GMMchi_scs_pipeline Works in Postprocessing scRNA-seq Data
Here, we outline each of the step included in the postprocessing pipeline:
- Remove empty barcodes
- Remove doublets
- Remove non-expressing genes
- Remove barcodes (cells) with low library complexity
- Remove poor quality cells via the GMMchi-based filter using housekeeping genes
- Remove barcodes (cells) with high level of mitochondrial rna expression
- Normalize data
Available Tools in the GMMchi_scs_pipeline Package
Postprocess Input scRNA-seq Data
GMMchi_scs.GMMchi_scs_pipeline is the one-step postprocessing pipeline that takes in a dataframe with barcodes (row) x genes (columns) and returns a postprocessed dataframe with the same format
#run through your raw scRNA-seq data through the GMMchi single cell postprocessing pipeline
import GMMchi_scs_pipeline as GMMchi_scs #load the library
postprocessed_df = GMMchi_scs.GMMchi_scs_pipeline(cell_lines_scRNA)
Map/Visualize Postprocessed Data Using UMAP
GMMchi_scs.UMAP_graph takes in a dataframe with barcdoes (row) x genes (columns) and returns a dataframe with barcodes (row) x UMAP features (columns) ready for visualization. For more information on UMAP please see here
#map the postprocessed data with UMAP (dimensionality reduction technique)
UMAP_df = GMMchi_scs.UMAP_graph(postprocessed_df)
Label UMAP for Downstream Visualization
GMMchi_scs.Label_graph is a built-in function that takes in a dataframe with barcodes (row) x features (columns). We've simplified the method so that users can quickly visualize their genes of interest easily.
#use this if you just want the cells to be colored if the cell is expressing a gene above threshold
Label_graph(postprocessed_df, UMAP_df, label_list=['ALPI'])
# use this if you want the cells to be colored in according to the level of expression of the gene
Label_graph(postprocessed_df, UMAP_df, label_list=['ALPI'], boolean_visualization=False)
Working Example
Please find a working example in the example folder
Authors
- Ta-Chun (Jeff) Liu - jeffliu6068
- Sir Walter Fred Bodmer FRS FRSE - Supervision
License
This project is licensed under the MIT License - see the LICENSE.md file for details
Acknowledgments
- Hat tip to anyone whose code was used
- Inspiration: Thank you for all that has contributed ideas and expertise to make this possible. Let's advance science together.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file GMMchi_scs_pipeline-0.1.tar.gz
.
File metadata
- Download URL: GMMchi_scs_pipeline-0.1.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9419487db03138aa24a9005279b6992e7b7a22258f821902d225ec865d6b47a |
|
MD5 | 649944030ab8781b4425012334eea9d8 |
|
BLAKE2b-256 | 4c467183c912d880c4106aee09b146af55e05ee5dcd8afb882238759665076b8 |
File details
Details for the file GMMchi_scs_pipeline-0.1-py3-none-any.whl
.
File metadata
- Download URL: GMMchi_scs_pipeline-0.1-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b39dbeed0ffffaab88c0781e43f4b2020c2bcac492fe175f86fd31d6ed72d01 |
|
MD5 | bf247a56a330ae9529a854b0594f8a2c |
|
BLAKE2b-256 | 38dfa3e8efcfdbed6bbb4209e920b12339182531039cdf8a9cbf88d38a4a162e |