A comprehensive analysis tool for transfering phenotype of bulk transcritomic data to single cell or spatial transcriptomic data.
Project description
TiRank
TiRank is a comprehensive tool for integrating and analyzing RNA-seq and scRNA-seq data. This document provides detailed instructions on how to install TiRank in your environment.
Installation Instructions for TiRank
TiRank can be installed through multiple methods. We recommend creating a new conda environment specifically for TiRank for optimal compatibility and isolation from other Python packages.
Method 1: Online pip Installation
- Set up a new conda environment:
conda create -n TiRank python=3.9.7 -y conda activate TiRank
- Navigate to the TiRank directory:
cd ./TiRank
- Install TiRank via pip:
pip install TiRank
- Additionally, install the
timm==0.5.4
package from TransPath. This is a required dependency for TiRank. Follow these steps:- Download the modified
timm==0.5.4
package from this link. - Install it using pip with the path to the downloaded package:
pip install /YOUR/LOCATION/TO/PACKAGE/timm-0.5.4.tar # Replace with your actual path
- Download the modified
- Download the pre-trained CTransPath model weights.
Method 2: Local conda Installation
- Clone the TiRank repository:
git clone git@github.com:LenisLin/TiRank.git
- Modify the
TiRank.yml
environment file. Replace the "prefix" at the bottom of this file with your path to the conda environment files. - Create the environment from the
TiRank.yml
file:conda env create -f TiRank.yml
Method 3: Docker Installation (Highly Recommended)
(Instructions to be provided)
Please choose the installation method that best suits your setup. If you encounter any issues, feel free to open an issue on the TiRank GitHub page.
Usage
Gene Pair extractor
from TiRank.main import *
GenePairSelection(scst_exp_path, bulk_exp_path, bulk_cli_path, datatype, mode, savePath, lognormalize = True, top_var_genes=2000, top_gene_pairs=1000, p_value_threshold=0.05)
Input:
-
scst_exp_path
: If your datatype is SC, you should provideCSV
files path which rows represent genes and column represent cells. The first row should be cells id and first column should be gene symbol.If your datatype is ST, you should provide 10x spaceranger output folder path.
-
bulk_exp_path
: CSV files which rows represent genes and column represent samples. The first row should be samples id and first column should be gene symbol. -
bulk_cli_path
: CSV files with 2 column if your phenotype label is continuous or binary. Or CSV files with 3 column if your phenotype label is survival with second column represent survival time and third column represent survival status. No need to set column names. The first column should be samples id same as the bulk_exp file. Specifically, if your phenotype label is binary you need to convert them to 01 form. -
datatype
:SC
represent scRNA-seq data.ST
represent spatial transcriptomics data. -
mode
:Cox
represent your phenotype label is survival,Classification
represent your phenotype label is binary,Regression
represent your phenotype label is continuous. -
savePath
: The path to save model. -
lognormalize
: Whether to perform lognormalize. -
top_var_genes, top_gene_pairs, p_value_threshold
: See datails in Hyperparameter in TiRank part
Model training and prediction
TiRank(savePath, datatype, mode, device="cuda")
Input:
savePath, datatype, mode
:Same as the input in GenePairSelection functiondevice
: Whether usecuda
orcpu
to train model
Result interpretation
After successfully running the above two steps, you can find the file named spot_predict_score.csv in the path savePath/3_Analysis/ , where the Rank_Label column represents the TiRank prediction result.
For Cox
mode, Rank+ cells are associated with worse survival, and Rank- cells are associated with good survival.
For Classification
mode, Rank+ cells are associated with phenotype of the group encoded as 1, and Rank- cells are associated with phenotype of the group encoded as 0.
For Regression
mode, Rank+ cells are associated with high phenotype label scores, and Rank- cells are associated with low phenotype label scores. For example, if input is the IC50 of different cell lines, Rank+ cells associated with drug resistance and Rank- cells associated with drug sensitivity.
Hyperparameter in TiRank
In TiRank, six key hyperparameters influence the results. The first three are crucial for feature selection in bulk transcriptomics, while the latter three are used for training the multilayer perceptron network. TiRank autonomously chooses suitable combinations for these latter three parameters within a predefined range (Detailed in our article Methods-Tuning of Hyperparameters). However, due to the variability across different bulk transcriptomics datasets, we cannot preset the first three hyperparameters. We give the default setting and clarify the function of each parameter to help users get a optimal results.
-
top_var_genes
:Considering the high dropout rates in single-cell or spatial transcriptome datasets, the initial feature selection step is to select highly variable features, top_var_genes. Default setting for top_var_genes is 2000. If users find the number of filtered genes is low, you could increase the top_var_genes. -
p_value_threshold
:p_value_threshold indicates the significance between each gene and phenotype(Detailed in our article Methods-scRank workflow design-Step1). A lower p_value_threshold indicates a stronger ability of gene to distinguish different phenotypes in bulk transcriptomics. Default setting for p_value_threshold is 0.05. Depending on the number of filtered genes, users may need to adjust this threshold. If users find the number of filtered genes is low, you could increase the p_value_threshold. -
top_gene_pairs
:top_gene_pairs is used to selected highly variable gene pairs in bulk transcriptomics that more effectively differentiate phenotypes. Default setting for top_gene_pairs is 2000. -
alphas
:alphas determine the weight of different components in total loss computation. (Detailed in our article Methods-scRank workflow design-Step2) -
n_epochs
:n_epochs is the number of training epochs in TiRank. -
lr
:The learning rate (lr) controls the step size of model training during each iteration of parameter updates. A lower learning rate corresponds to more gradual updates, resulting in slower convergence over each epoch. Conversely, a higher learning rate might cause the model to oscillate around the optimal solution, potentially preventing the attainment of the best results.
TiRank Web
In order to use TiRank's web pages, you need to go to the Web folder first.
cd ./Web
Everything you do next should be done in this directory. Next you need to do the following steps:
- Create the data folder
mkdir data
- Create an ExampleData folder inside the data floder and download the sample data from https://drive.google.com/drive/folders/1CsvNsDOm3GY8slit9Hl29DdpwnOc29bE
cd data
mkdir ExampleData
cd ../
You need to make sure your file directory structure is as follows:
Web/
├── assets/
├── components/
├── img/
├── layout/
├── data/
│ ├── ExampleData
│ │ ├── CRC_ST_Prog/
│ │ └── SKCM_SC_Res/
├── tiRankWeb/
└── app.py
- You can now run your web application.
python app.py
More tutorials on the Web can be found in the "Tutorials" section of the web page.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tirank-0.6.tar.gz
.
File metadata
- Download URL: tirank-0.6.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb42ac34be7af3feb2a895f79c47ca76d00498d38dc0e54b3306a2fb5982aba6 |
|
MD5 | 89dcce1f7ab92eded70299bf050fc6da |
|
BLAKE2b-256 | 20fdde995de86905b32ea9c0831bd46baf4032b0556ed82bf4d9e3471c001b50 |
File details
Details for the file TiRank-0.6-py3-none-any.whl
.
File metadata
- Download URL: TiRank-0.6-py3-none-any.whl
- Upload date:
- Size: 39.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5594e4f5fd07ced22f5e67b0bc86a441da9a9a35a40e8fac8ebda1486e8e6a20 |
|
MD5 | 1a44511bb2017d937f25741f0f754e07 |
|
BLAKE2b-256 | d679d696dd0afd7850d8d281a07d70daaec2d2cb97f6170bd4e9369432a6ec12 |