Skip to main content

CStreet is a python script (python 3.6 or higher) for cell states trajectory construction by using k-nearest neighbors graph algorithm for time-series single-cell RNA-seq data.

Project description

CStreet: a computed Cell States trajectory inference method for time-series single-cell RNA-seq data

| Overview | Installation | Quick Start | Parameter Details | Run CStreet in python interface | Citation |

Figure1

Overview

CStreet is a cell states trajectory inference method for time-series single-cell RNA-seq data. It is written in python (python 3.6 or higher) and is available as a commend line tool and a python library to meet the needs of different users.

CStreet takes advantage of time-series information to construct the k-nearest neighbors connections within and between time points. Then CStreet calculated the connection probabilities of cell states and visualized the trajectory which may include multiple starting points and paths using a force-directed layout method.

Installation

CStreet has been packaged and uploaded to PyPI. Before your installation, you'll make sure you have pip available. The pip3 is the package installer for Python. If you don't have pip3 on your machine, try click here to install it. Then CStreet and its relevant packages can be installed using one single commands as follows.

$ pip3 install cstreet 

Type the following command to check whether CStreet has been installed successfully.

$ CStreet -h

Quick Start

Input:

Commandline:

$ CStreet -i ExpressionMatrix_t1.txt ExpressionMatrix_t1.txt ExpressionMatrix_t1.txt -s CellStates_t1.txt CellStates_t2.txt CellStates_t3.txt -n ProjectName

Output:

The contents of the output directory in tree format will be displayed as follows:

PATH/ProjectName
├── cstreet_result.pdf
├── figures
│   ├── timepoint1_fa.pdf
│   ├── timepoint1_louvain_umap_cord.txt
│   ├── timepoint1_louvain_umap.pdf
│   ├── timepoint2_fa.pdf
│   ├── timepoint2_louvain_umap_cord.txt
│   ├── timepoint2_louvain_umap.pdf
│   ├── timepoint3_fa.pdf
│   ├── timepoint3_louvain_umap_cord.txt
│   └── timepoint3_louvain_umap.pdf
└── results
    ├── alltimepoint_link_cluster_graph.txt
    ├── timepoint1_cell_info.txt
    ├── timepoint1_gene_info.txt
    ├── timepoint1_inner_cluster_graph.txt
    ├── timepoint2_cell_info.txt
    ├── timepoint2_gene_info.txt
    ├── timepoint2_inner_cluster_graph.txt
    ├── timepoint3_cell_info.txt
    ├── timepoint3_gene_info.txt
    └── timepoint3_inner_cluster_graph.txt
  • The clustered cell states information if not provided by users. (timepoint*_cell_info.txt)
  • The connection probabilities of cell states. (alltimepoint_link_cluster_graph.txt, timepoint*_inner_cluster_graph.txt)
  • An visulization of inferred cell states trajectory. (cstreet_result.pdf)

tiny_data_result.pdf

Parameter Details

usage: CStreet [-h] <-i ExpMatrix1 ExpMatrix2 ExpMatrix3 ...> [-s CellStates1 CellStates2 CellStates3 ...] [-n ProjectName] [-o OutputDir] [options]

CStreet is a cell states trajectory inference method for time-series single-cell RNA-seq data.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_EXPMATRIX [INPUT_EXPMATRIX ...], --Input_ExpMatrix INPUT_EXPMATRIX [INPUT_EXPMATRIX ...]
                        Expression matrixes, which will contain the time-
                        series expression level as reads counts or normalized
                        values in tab delimited format. For example: '-i
                        ExpressionMatrix_t1.txt ExpressionMatrix_t2.txt
                        ExpressionMatrix_t3.txt' means the input of 3
                        timepoints expression matrixes.
  --Input_CellonCol {True,False}, -T {True,False}
                        Whether the cells are arranged on rows or columns in
                        the expression matrixes. For example: '-T True' means
                        cells on columns and genes on rows. DEFAULT: False.
  --Output_Dir OUTPUT_DIR, -o OUTPUT_DIR
                        Project directory, which will be used to save all
                        output files. DEFAULT: "./"
  --Output_Name OUTPUT_NAME, -n OUTPUT_NAME
                        Project name, which will be used to generate output
                        file names. DEFAULT: "cstreet_project"
  --Input_CellStates [INPUT_CELLSTATES [INPUT_CELLSTATES ...]], -s [INPUT_CELLSTATES [INPUT_CELLSTATES ...]]
                        Cell states information, which must contain a columns
                        named "state" and the same cell ID with expression
                        matrixes in tab delimited format. Cell states
                        information can be inputted by the user or generated
                        by the internal clustering function of CStreet using
                        the Louvain algorithm. For example: 'CellStates_t1.txt
                        CellStates_t2.txt CellStates_t3.txt' means the cell
                        states information of 3 timepoints expression
                        matrixes.
  --CellClusterParam_PCAn CELLCLUSTERPARAM_PCAN
                        Number of principal components to use, which will be
                        enabled ONLY if cell states information is not
                        provided. It can be set to 1 - minimum dimension size
                        of expression matrixes. DEFAULT: 10
  --CellClusterParam_kNNn CELLCLUSTERPARAM_KNNN
                        Number of nearest neighbors to be searched, which will
                        be enabled ONLY if cell states information is not
                        provided. It should be in the range 2 to 100 in
                        general. DEFAULT: 15
  --CellClusterParam_Resolution CELLCLUSTERPARAM_RESOLUTION
                        Resolution of the Louvain algorithm, which will be
                        enabled ONLY if cell states information is not
                        provided. Higher resolution means finding more and
                        smaller clusters. DEFAULT: 1.0
  --Switch_DeadCellFilter {ON,OFF}
                        A switch of dead cell filter, which filter cell
                        outliers based on counts percent of Mitochondrial
                        gene. DEFAULT: "ON"
  --Threshold_MitoPercent THRESHOLD_MITOPERCENT
                        Maximum counts percent of Mitochondrial gene for a
                        cell to pass filtering, which will be enabled ONLY if
                        '--Switch_DeadCellFilter' is "ON". DEFAULT: 0.2
  --Switch_LowCellNumGeneFilter {ON,OFF}
                        A switch of low cell number gene filter, which keep
                        genes that are expressed in at least a number of
                        cells. DEFAULT: "ON"
  --Threshold_LowCellNum THRESHOLD_LOWCELLNUM
                        Minimum number of cells expressed required for a gene
                        to pass filtering, which will be enabled ONLY if '--
                        Switch_LowCellNumGeneFilter' is "ON". DEFAULT: 3
  --Switch_LowGeneCellsFilter {ON,OFF}
                        A switch of low gene number cell filter, which keep
                        cells with at least a number of genes expressed.
                        DEFAULT: "ON"
  --Threshold_LowGeneNum THRESHOLD_LOWGENENUM
                        Minimum number of genes expressed required for a cell
                        to pass filtering, which will be enabled ONLY if '--
                        Switch_LowGeneCellsFilter' is "ON". DEFAULT: 200
  --Switch_Normalize {ON,OFF}
                        A switch of total read count normalization for per
                        cell. DEFAULT: "NO"
  --Threshold_NormalizeBase THRESHOLD_NORMALIZEBASE
                        Normalize Base of normalization, which will be enabled
                        ONLY if '--Switch_Normalize' is "ON". If choosing
                        DEFAULT, it is CPM normalization. DEFAULT: 1e6
  --Switch_LogTransform {ON,OFF}
                        A switch of logarithmizing the expression matrix.
                        DEFAULT: "NO"
  --WithinTimePointParam_PCAn WITHINTIMEPOINTPARAM_PCAN
                        Number of principal components to use, which will be
                        used within a timepoint. It can be set to 1 - minimum
                        dimension size of expression matrixes. DEFAULT: 10
  --WithinTimePointParam_kNNn WITHINTIMEPOINTPARAM_KNNN
                        Number of nearest neighbors to be searched, which will
                        be used within a timepoint. It should be in the range
                        2 to 100 in general. DEFAULT: 15
  --BetweenTimePointParam_PCAn BETWEENTIMEPOINTPARAM_PCAN
                        Number of principal components to use, which will be
                        used between timepoints. It can be set to 1 - minimum
                        dimension size of expression matrixes. DEFAULT: 10
  --BetweenTimePointParam_kNNn BETWEENTIMEPOINTPARAM_KNNN
                        Number of nearest neighbors to be searched, which will
                        be used between timepoints. It should be in the range
                        2 to 100 in general. DEFAULT: 15
  --ProbParam_SamplingSize PROBPARAM_SAMPLINGSIZE
                        Number of repeated sampling trials to estimate the
                        connection probability. DEFAULT: 5
  --ProbParam_RandomSeed PROBPARAM_RANDOMSEED
                        Random seed of repeated sampling, which will make the
                        connection probability is reproducible. DEFAULT: 0
  --FigureParam_FigureSize FIGUREPARAM_FIGURESIZE
                        Figure size of the result figure. Format is (width,
                        height). DEFAULT:(6,7)
  --FigureParam_LabelBoxWidth FIGUREPARAM_LABELBOXWIDTH
                        Width of the label box in the result figure. For
                        example: '--FigureParam_LabelBoxWidth 10' means 10
                        characters will be showed in label box of result
                        figure. DEFAULT: 10
  --Threshold_MaxOutDegree THRESHOLD_MAXOUTDEGREE
                        Maximum number of outdegree for each cell state will
                        be displayed, which will ONLY be used for
                        visualization. DEFAULT: 10
  --Threshold_MinCellNumofStates THRESHOLD_MINCELLNUMOFSTATES
                        Minimum cell number of each cell state will be
                        displayed, which will ONLY be used for visualization.
                        DEFAULT: 50

Run CStreet in python interface

CStreet can run directly or step by step in Jupyter Notebook. The tutorial is here.

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cstreet-0.0.14.tar.gz (23.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page