Skip to main content

CytoTRACE 2 is a computational method for predicting cellular potency categories and absolute developmental potential from single-cell RNA-sequencing data.

Project description

Prediction of absolute developmental potential
using single-cell expression data

CytoTRACE 2 is a computational method for predicting cellular potency categories and absolute developmental potential from single-cell RNA-sequencing data.

Potency categories in the context of CytoTRACE 2 classify cells based on their developmental potential, ranging from totipotent and pluripotent cells with broad differentiation potential to lineage-restricted oligopotent, multipotent and unipotent cells capable of producing varying numbers of downstream cell types, and finally, differentiated cells, ranging from mature to terminally differentiated phenotypes.

The predicted potency scores additionally provide a continuous measure of developmental potential, ranging from 0 (differentiated) to 1 (totipotent).

Underlying this method is a novel, interpretable deep learning framework trained and validated across 34 human and mouse scRNA-seq datasets encompassing 24 tissue types, collectively spanning the developmental spectrum.

This framework learns multivariate gene expression programs for each potency category and calibrates outputs across the full range of cellular ontogeny, facilitating direct cross-dataset comparison of developmental potential in an absolute space.

Installation instructions

Expand section

Install using pip/pip3:

pip install cytotrace2-py

Optional: For faster data reading, you can optionally install datatable (e.g., pip install datatable or conda install -c conda-forge datatable) if your OS and environment support it. If not installed, the code will default to using pandas for reading input files.

Input files

Expand section

By default, CytoTRACE 2 requires only a single-cell gene expression file as input. For output plotting, a cell annotation file is accepted as well. All files should be provided in a tab-delimited tabular input format (.txt) with no double quotations. Further formatting details for each input file are specified below.

scRNA-seq gene expression file:

  • The table must be genes (rows) by cells (columns).
  • The first row must contain the single cell IDs and the first column must contain the gene names.
  • The gene expression data can be represented as raw or normalized counts, as long as normalization preserves relative ranking of input gene values within a cell.
  • No instances of duplicate gene or cell names should be present.

Cell annotation file:

  • The table should contain two columns, where column 1 contains the single cell IDs corresponding to the columns of the scRNA-seq matrix and column 2 contains the corresponding cell type annotations.
  • The columns must have a header.
  • Additional columns beyond the first two will be ignored.

Running CytoTRACE 2

Expand section

After installing the package and its dependencies, the tool is ready to run both from command line and from within a Python script.

NOTE: Only during the first run, the script will download remotely hosted underlying models. This is done just once and saved for all future runs.

For mouse data with default settings, CytoTRACE 2 would be run from command line as:

cytotrace2 \
   --input-path /path/to/cell_expression \
   --annotation-path /path/to/cell_annotations \
   --species mouse

Or with more condensed parameter names:

cytotrace2 \
   -f /path/to/cell_expression \
   -a /path/to/cell_annotations \
   -sp mouse

CytoTRACE 2 can also be called from within a Python script after importing the package.

from cytotrace2_py.cytotrace2_py import *

input_path = "/path/to/cell_expression.txt"
example_annotation_path = "/path/to/cell_annotations.txt"
example_species = "human"

results =  cytotrace2(input_path,
                     annotation_path=example_annotation_path,
                     species=example_species)

NOTE: When running on computers with less than 16GB memory, we recommend reducing max_cores parameter to 1 or 2 to avoid memory issues.

Extended usage details

Expand section

Key notes:

  • By default, CytoTRACE 2 expects mouse data. To provide human data, users should specify species = "human"

Required input:

  • input_path: A filepath to a .txt file containing a single-cell RNA-seq gene expression matrix (rows as genes, columns as cells)

Optional arguments:

  • annotation_path: A filepath to a .txt file containing phenotype annotations corresponding to the cells of the scRNA-seq expression matrix
  • species: String indicating the species name for the gene names in the input data (options: "human" or "mouse", default is "mouse").
  • batch_size: Integer indicating the number of cells to process at once for the pipeline steps, including subsampling for KNN smoothing (default is 10000).
  • smooth_batch_size: Integer indicating the number of cells to subsample further within the batch_size for the smoothing by diffusion step of the pipeline (default is 1000).
  • disable_parallelization: Flag indicating whether to disable parallelization (default is FALSE, or absent for the command line).
  • disable_plotting: Flag indicating whether to disable plotting (default is FALSE, or absent for the command line). To plot results, data are reprocessed following the core CytoTRACE 2 pipeline to produce UMAP embeddings via a standard scanpy pipeline. As this step can be time-consuming, we provide the option to disable it if desired.
  • max_cores: Integer indicating user-provided restriction on the maximum number of CPU cores to use for parallelization (default is None, and the number of cores used will then be determined based on system capacity; when running on computers with less than 16GB memory, we recommend reducing it to 1 or 2 to avoid memory issues.).
  • seed: Integer specifying the seed for reproducibility in random processes (default is 14).
  • output_dir: Path to the directory to which to save CytoTRACE 2 outputs (default is cytotrace2_results in the current working directory).

Information about these arguments is also available in the function's manual, which can be accessed by running cytotrace2 -h from the command line.

For more information about the tool, R/Python package documentation, Vignettes, example input files, and more, visit out GitHub page: https://github.com/digitalcytometry/cytotrace2.

CytoTRACE 2 web application

An interactive RShiny web application can be accessed at cytotrace2.stanford.edu, allowing users to run analyses on their own data, browse results for 33 ground-truth–annotated datasets, explore potency-associated genes and gene-set enrichment across the single-cell potency atlas, download the atlas, and access Python vignettes for model training and custom GSBN architectures.

Authors

CytoTRACE 2 was developed in the Newman Lab by Minji Kang, Gunsagar Gulati, Erin Brown, Susanna Avagyan, Jose Juan Almagro Armenteros and Rachel Gleyzer.

Contact

If you have any questions, please contact the CytoTRACE 2 team at cytotrace2team@gmail.com.

License

Please see the LICENSE file.

Citation

If you use CytoTRACE 2, please cite:

Improved reconstruction of single-cell developmental potential with CytoTRACE 2. Nature Methods, 2025.
Minji Kang*, Gunsagar S. Gulati*, Erin L. Brown*, Zhen Qi*, Susanna Avagyan, Jose Juan Almagro Armenteros, Rachel Gleyzer, Wubing Zhang, Chloé B. Steen, Jeremy Philip D’Silva, Janella Schwab, Michael F. Clarke, Aadel A. Chaudhuri, and Aaron M. Newman. doi.org/10.1038/s41592-025-02857-2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cytotrace2_py-1.1.0.4.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cytotrace2_py-1.1.0.4-py3-none-any.whl (4.5 MB view details)

Uploaded Python 3

File details

Details for the file cytotrace2_py-1.1.0.4.tar.gz.

File metadata

  • Download URL: cytotrace2_py-1.1.0.4.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.11

File hashes

Hashes for cytotrace2_py-1.1.0.4.tar.gz
Algorithm Hash digest
SHA256 133f5075983ba1c4ffa0db98a46a24ad695a251bdcf5d12e5f3f3e24e2fc5589
MD5 1978fa2ac3f319e5f8f9a17dd70cb5c1
BLAKE2b-256 32620810b83be3f23cd14c84f9174b798ab73d2ae067d58e5dba930d851102da

See more details on using hashes here.

File details

Details for the file cytotrace2_py-1.1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for cytotrace2_py-1.1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1471fbe346f10502599e894d696184067c893ff6a79ab24be6cc0c3435e4cabe
MD5 5a14e07216c6947bf1f1eb24c564fa3f
BLAKE2b-256 745aa439282d22b898d02a5c324ead630248a2df44309be1b6e9653d37430937

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page