Skip to main content

Classify scRNA-seq profiling with highly resolved cell cycle phases.

Project description

ccAFv2: Cell cycle classifier for Python and scanpy

standard-readme compliant

This repository is for the Python package for the cell cycle classifier ccAFv2. The input for the ccAFv2 classifier is single cell, nuclei, or spatial RNA-seq data. The features of this classifier are that it classifies six cell cycle states (G1, Late G1, S, S/G2, G2/M, and M/Early G1) and a quiescent-like G0 state, and it incorporates a tunable parameter to filter out less certain classifications. This package is implemented in Python so that it can be used in scanpy analysis workflows. We provide examples of how to install, run ccAFv2 on scanpy objects (sc/snRNA-seq), and plot and use results.

Table of Contents

Install

Requirements

It is strongly suggested that users utilize the docker images we provide on DockerHub as they contain all dependencies needed to run ccAFv2.

Dependencies

There are four dependencies that must be met for ccAF to classify cell cycle states:

  1. numpy - (install)
  2. scipy - (install)
  3. scanpy - (install)
  4. tensorflow - (install)
  5. keras - (install)
Python dependency installation commands

NOTE! pip may need to be replaced with pip3 depending upon your setup.

pip install numpy scipy scanpy tensorflow keras

Installation of ccAF classifier

The ccAFv2 classifier can be installed with the following command:

pip install ccAFv2

Docker image

We facilitate the use of ccAFv2 by providing a Docker Hub container cplaisier/ccafv2 which has all the dependencies and libraries required to run the ccAFv2 classifier. To see how the Docker container is configured plaese refer to the Dockerfile. Please install Docker and then from the command line run:

docker pull cplaisier/ccafv2_py

Then run the Docker container using the following command (replace with the directory where you have the scRNA-seq data to be classified):

docker run -it -v '<path to scRNA-seq profiles directory>:/files' cplaisier/ccafv2_py

This will start the Docker container in interactive mode and will leave you at a command prompt. You will then want to change directory to where you have your scRNA-seq or trasncriptome profiling data.

Classifying single cell or nuclei RNA-seq

Input for classification

It is expected that the input for the ccAFv2 classifier will be a scanpy AnnData object that has been thorougly quality controlled. Is is preferred that the data in the object be SCTransformed; however, the standard approach for normalization only applies to the highly variable genes. This can exclude genes needed for the accurate classification of the cell cycle. During the running of the ccAFv2 classifier it will tell you how many genes overlap with the classifier marker genes.

Test data

The human neural stem cells (hNSCs) from a human fetus 8 weeks post-conception (PCW8) (Zeng et al., 2023) is available for use as a testing dataset:

Download this file and place it into the directory in which you wish to run the ccAFv2 tutorial below. This data has been QC'd and normalized using SCTransform in Seurat following our best practices here.

Cell cycle classification

Classification is as easy as two lines that can be added to any Seurat workflow. First the library must be loaded and then the PredictCellCycle function is run:

# Load packages
import pandas as pd
import scanpy as sc
import ccAFv2

# Load up test dataset
PCW8 = sc.read_h5ad('W8-1_normalized_ensembl.h5ad')

# Run ccAFv2 to predict cell labels
PCW8_labels = ccAFv2.predict_labels(PCW8, species='human', gene_id='ensembl')

When the classifier is running it should look something like this:

Running ccAFv2:
    Preparing data for classification...
    Marker genes present in this dataset: 845
    Missing marker genes in this dataset: 16
  Predicting cell cycle state probabilities...
  Choosing cell cycle state...
Done.

It is important to look at how many marker genes were present in the dataset. We found that when less than 689 marker genes (or 80%) were found in the dataset that this led significantly less accurate predictions.

There are several options that can be passed to the PredictCellCycle function:

ccAFv2.predict_labels(scanpy_obj,
                      threshold=0.5,
                      species='human',
                      gene_id='ensembl')
  • scanpy_obj: a scanpy object must be supplied to classify, no default
  • threshold: the value used to threshold the likelihoods, default is 0.5
  • species: from which species did the samples originate, either 'human' or 'mouse', defaults to 'human'
  • gene_id: what type of gene ID is used, either 'ensembl' or 'symbol', defaults to 'ensembl'

Cell cycle classification results

The results of the cell cycle classification are stored in the first element of the 'ccAFv2.predict_labels' output, and the likelihoods are stored in the second element.

PCW8_labels

Which returns the following:

(array(['G1', 'S', 'qG0', ..., 'Late G1', 'qG0', 'S'],
      dtype='<U10'), array([[9.96540964e-01, 2.77950567e-05, 1.62392517e-03, ...,
        1.12998277e-04, 1.34928769e-03, 3.37415549e-04],
       [2.38446728e-03, 9.58865421e-05, 4.40378720e-03, ...,
        1.25416279e-01, 7.62154996e-01, 1.05463535e-01],
       [3.09040988e-05, 1.47282879e-06, 3.99237297e-06, ...,
        9.99962687e-01, 1.65011215e-07, 8.85220061e-07],
       ...,
       [2.72926106e-03, 3.59202386e-03, 9.85045612e-01, ...,
        8.02930258e-03, 1.00778125e-04, 4.46247839e-04],
       [1.43443659e-01, 1.61317177e-03, 2.99169007e-03, ...,
        8.51489604e-01, 1.09878434e-04, 3.27764486e-04],
       [7.05660739e-07, 1.27422639e-09, 1.13804369e-07, ...,
        1.86515393e-07, 9.99996066e-01, 2.80967629e-06]], dtype=float32))

In the code Below we demonstrate how the classifications can be added to the metadata. After adding the column to the .obs metadata, the classification for each cell would then found in the column 'ccAFv2', and is a categorical variable which helps with plotting.

# Save into scanpy object
PCW8.obs['ccAFv2'] = pd.Categorical(PCW8_labels[0], categories=['qG0', 'G1', 'Late G1', 'S', 'S/G2', 'G2/M', 'M/Early G1', 'Unknown'], ordered=True)

Plotting cell cycle states

We provide plotting functions that colorize the cell cycle states in the way used in our manuscripts. We strongly suggest using these functions when plotting if possible.

Plotting a UMAP with cell cycle states

Plotting cells using ther first two dimensions from a dimensionality reduction method (e.g., PCA, tSNE, or UMAP) is a common way to represent single cell or nuclei RNA-seq data. Below we provide code to plot the cells colorized based on their called cell cycle state.

# Run UMAP of U5 hNSCs
sc.pp.highly_variable_genes(PCW8, n_top_genes=2000)
sc.tl.pca(PCW8)
sc.pp.neighbors(PCW8)
sc.tl.umap(PCW8)

# Prepare a color mapping dictionary
cmap1 = {"qG0": "#d9a428", "G1": "#f37f73", "Late G1": "#1fb1a9",  "S": "#8571b2", "S/G2": "#db7092", "G2/M": "#3db270" ,"M/Early G1": "#6d90ca",  "Unknown": "#d3d3d3"}

# Plot UMAP of U5 hNSCs
sc.pl.umap(PCW8, color=['ccAFv2'], palette=cmap1, save='ccAFv2_UMAP_PCW8.pdf')

In the figures folder you will find the PDF 'umapccAFv2_UMAP_PCW8.pdf'. Below is the UMAP for the hNSCs from a human fetus 8 weeks post-conception colorized using the cell cycle states. The expected flow of the cell cycle states can be seen in the UMAP.

UMAP DimPlot colorized with ccAFv2 cell cycle states

Maintainers

For issues or comments please contact: Chris Plaisier

And for other great packages from the Plaisier Lab please check here: @plaisier-lab.

Contributing

Feel free to dive in! Open an issue or submit PRs.

Citation

  1. Citation for ccAFv2 (version 2):

Classifying cell cycle states and a quiescent-like G0 state using single-cell transcriptomics Samantha A. O’Connor, Leonor Garcia, Anoop P. Patel, Benjamin B. Bartelle, Jean-Philippe Hugnot, Patrick J. Paddison, Christopher L. Plaisier. bioRxiv [Preprint]. 2024 Apr 20:2024.04.16.589816. doi: 10.1101/2024.04.16.589816. PMID: 38659838

  1. Citation for ccAF (version 1):

Neural G0: a quiescent-like state found in neuroepithelial-derived cells and glioma. Samantha A. O'Connor, Heather M. Feldman, Chad M. Toledo, Sonali Arora, Pia Hoellerbauer, Philip Corrin, Lucas Carter, Megan Kufeld, Hamid Bolouri, Ryan Basom, Jeffrey Delrow, Jose L. McFaline-Figueroa, Cole Trapnell, Steven M. Pollard, Anoop Patel, Patrick J. Paddison, Christopher L. Plaisier. Mol Syst Biol. 2021 Jun;17(6):e9522. doi: 10.15252/msb.20209522. PMID: 34101353

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccAFv2-2.0.6.tar.gz (5.2 MB view details)

Uploaded Source

File details

Details for the file ccAFv2-2.0.6.tar.gz.

File metadata

  • Download URL: ccAFv2-2.0.6.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for ccAFv2-2.0.6.tar.gz
Algorithm Hash digest
SHA256 bb0e82f153188a03ce5c3d524a6c6e69aa2eb0ee72bf63dc17e409e259d806cc
MD5 bff6a7a82da5ede4e2c1b9c4c0056dec
BLAKE2b-256 e9405f9e26932f6510a366e62186b31d08c0391341f2dd8ddf25867357be65fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page