Skip to main content

Pre-print version of the GMMchi-based single cell postprocessing pipeline

Project description

GMMchi_scs_pipeline

GMMchi is the python package for postprocessing single cell RNA-seq data specifically on mixtures of similar cell type. GMMchi_scs_pipeline leverages GMMchi to remove cluster and visulization bias due to highly varying factors such as library size, library complexity as well as highly expressed genes including housekeeping genes, mitochondrial and ribosomal gene.

Mixtures of similar cell types can be extremely challenging in seperating during clustering and visualization due to the high degree of homology between the expression profile across each cell. In addition, there failure in removing confounding factors may lead to serious misinterpretation in the data. GMMchi_scs_pipeline presents the first step in attempting to achieve a better seperation in mixtures of similar cell types. The package contains the necessary tools for postprocessing, dimensionality-reduction as well as visualization. The output returned by the postprocessing function of this package also provides a simple format if the user wishes to use other downstream visualization or clustering techinques.

Part of this package is the leverage of GMMchi, another pipeline we built that can be downloaded and found here.

Download Package

Download the GMMchi_scs_pipeline package by:

pip install git+https://github.com/jeffliu6068/GMMchi_scs_pipeline.git

or

pip install GMMchi_scs_pipeline

Import

Once installed, import the package by:

import GMMchi_scs_pipeline

Intuition: How GMMchi_scs_pipeline Works in Postprocessing scRNA-seq Data

Here, we outline each of the step included in the postprocessing pipeline:

  1. Remove empty barcodes
  2. Remove doublets
  3. Remove non-expressing genes
  4. Remove barcodes (cells) with low library complexity
  5. Remove poor quality cells via the GMMchi-based filter using housekeeping genes
  6. Remove barcodes (cells) with high level of mitochondrial rna expression
  7. Normalize data

Available Tools in the GMMchi_scs_pipeline Package

Postprocess Input scRNA-seq Data

GMMchi_scs.GMMchi_scs_pipeline is the one-step postprocessing pipeline that takes in a dataframe with barcodes (row) x genes (columns) and returns a postprocessed dataframe with the same format

#run through your raw scRNA-seq data through the GMMchi single cell postprocessing pipeline
import GMMchi_scs_pipeline as GMMchi_scs #load the library

postprocessed_df = GMMchi_scs.GMMchi_scs_pipeline(cell_lines_scRNA)

Map/Visualize Postprocessed Data Using UMAP

GMMchi_scs.UMAP_graph takes in a dataframe with barcdoes (row) x genes (columns) and returns a dataframe with barcodes (row) x UMAP features (columns) ready for visualization. For more information on UMAP please see here

#map the postprocessed data with UMAP (dimensionality reduction technique)
UMAP_df = GMMchi_scs.UMAP_graph(postprocessed_df)

Label UMAP for Downstream Visualization

GMMchi_scs.Label_graph is a built-in function that takes in a dataframe with barcodes (row) x features (columns). We've simplified the method so that users can quickly visualize their genes of interest easily.

#use this if you just want the cells to be colored if the cell is expressing a gene above threshold 
Label_graph(postprocessed_df, UMAP_df, label_list=['ALPI']) 

# use this if you want the cells to be colored in according to the level of expression of the gene
Label_graph(postprocessed_df, UMAP_df, label_list=['ALPI'], boolean_visualization=False) 

Working Example

Please find a working example in the example folder

Authors

  • Ta-Chun (Jeff) Liu - jeffliu6068
  • Sir Walter Fred Bodmer FRS FRSE - Supervision

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • Hat tip to anyone whose code was used
  • Inspiration: Thank you for all that has contributed ideas and expertise to make this possible. Let's advance science together.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GMMchi_scs_pipeline-0.1.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

GMMchi_scs_pipeline-0.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file GMMchi_scs_pipeline-0.1.tar.gz.

File metadata

  • Download URL: GMMchi_scs_pipeline-0.1.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for GMMchi_scs_pipeline-0.1.tar.gz
Algorithm Hash digest
SHA256 a9419487db03138aa24a9005279b6992e7b7a22258f821902d225ec865d6b47a
MD5 649944030ab8781b4425012334eea9d8
BLAKE2b-256 4c467183c912d880c4106aee09b146af55e05ee5dcd8afb882238759665076b8

See more details on using hashes here.

File details

Details for the file GMMchi_scs_pipeline-0.1-py3-none-any.whl.

File metadata

  • Download URL: GMMchi_scs_pipeline-0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for GMMchi_scs_pipeline-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1b39dbeed0ffffaab88c0781e43f4b2020c2bcac492fe175f86fd31d6ed72d01
MD5 bf247a56a330ae9529a854b0594f8a2c
BLAKE2b-256 38dfa3e8efcfdbed6bbb4209e920b12339182531039cdf8a9cbf88d38a4a162e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page