A package for inferring CNA fitness evolutionary trees and CNAs' evolutionary efficiency.
Project description
Description
A method for inferring CNA fitness evolutionary trees is based on multiple metrics, including genome similarity, aneuploid segregation distance, and the absolute distance between two genomes. Additionally, CNAs' evolutionary efficiency (CEE) is estimated to enable a quantitative assessment of de novo CNAs' efficiency.
System requirements and dependency
Software package development environment:
macOS
Python 3.11.3
This package requires Python version 3.9 or greater.
Installation
First create a virtual environment for fitPhylo, but this is not required.
conda create --name fitPhylo_env python=3.9
conda activate fitPhylo_env
1.From Pypi
You can install the latest release from PyPi, with:
pip install fitPhylo
2.Source code
You can install this package by opening a command terminal and running the following:
git clone https://github.com/FangWang-SYSU/fitPhylo.git
cd fitPhylo
pip install .
After the installation is successful, enter fitPhylo --version
on the command line and the following message will appear, indicating that the installation is successful.
fitPhylo 0.1
Usage
usage: fitPhylo [-h] [--version] -I INPUT -O OUTPUT [-p PREFIX] [-r RESOLUTION] [-t HUFFMAN_SPLIT_THRESHOLD] [-n N_NEIGHBORS] [-m MIN_CLONE_SIZE] [-s SCORING] [-R RANDOM_NUM] [-C CANCER_TYPE] [-c CORES] [-d DRAW]
A package for inferring CNA fitness evolutionary trees and CNAs' evolutionary efficiency.
options:
-h, --help show this help message and exit
--version show program's version number and exit
-I INPUT, --input INPUT
single-cell copy number profile.
-O OUTPUT, --output OUTPUT
The output path.
-p PREFIX, --prefix PREFIX
Prefix for output file names.
-r RESOLUTION, --resolution RESOLUTION
Lineage partitioning resolution(default=1).
-t HUFFMAN_SPLIT_THRESHOLD, --huffman_split_threshold HUFFMAN_SPLIT_THRESHOLD
huffman split threshold(default=0.9)
-n N_NEIGHBORS, --n_neighbors N_NEIGHBORS
Number of neighbors for creating affinity matrix in SNF(default=5).
-m MIN_CLONE_SIZE, --min_clone_size MIN_CLONE_SIZE
When min_clone_size is reached, division will no longer continue(default=0.1*cell_number).
-s SCORING, --scoring SCORING
Whether to run Scoring the chromosomal rearrangements.
-R RANDOM_NUM, --random_num RANDOM_NUM
Random number for creating a null distribution.
-C CANCER_TYPE, --cancer_type CANCER_TYPE
Select a cancer type for estimating WGD. The default is all.
-c CORES, --cores CORES
Number of cores required to run copy number variation events.
-d DRAW, --draw DRAW Draw tree and CNA heatmap.
Author: wangxin, Email: wangx768@mail2.sysu.edu.cn
You can load fitPhylo
module in python:
import fitPhylo as fp
Input files
The input file of fitPhylo needs to be an integer copy number spectrum:
The row is the genome segment,
the first column is the chromosome,
the second column is the genome starting coordinate,
the third column is the genome end coordinate,
and the other columns are the integer copy numbers at the cell level.
chr start end cell_1 cell_2 cell_3 ...
1 100167143 100220943 2 2 2 ...
1 100504443 100559237 2 1 2 ...
1 101395562 101451560 2 3 4 ...
Connection with inferCNV:
To obtain the integer copy number,we propose to identify peaks and infer their intervals, with each interval representing an integer copy number (detail in method).
Examples
Run in command line
The example data exampleCNA.txt
was included in the fitPhylo
package, and you can also download it from here
#fitPhylo [-h] [--version] -I INPUT -O OUTPUT [-p PREFIX] [-r RESOLUTION] [-t HUFFMAN_SPLIT_THRESHOLD] [-n N_NEIGHBORS] [-m MIN_CLONE_SIZE] [-s SCORING] [-R RANDOM_NUM] [-C CANCER_TYPE] [-c CORES]
mkdir fitPhylo_out
fitPhylo \
-I exampleCNA.txt \
-O ./fitPhylo_out \
-p example_ \
-r 1 \
-t 1 \
-n 5 \
-C ALL \
-c 8
The
-r
parameter is used for lineage partitioning resolution, where a higher value indicates greater precision. The-t
parameter represents the proportion of subtree splitting during theHuffman process
and takes values between 0 and 1. A higher value implies a lower probability of splitting two already merged cells. Additionally, the estimation of chromosome rearrangement score is influenced by the-R
parameter, with a larger value leading to longer runtime.
Run in python
# load package
import fitPhylo as fp
# 1.infer tree
fp.fitPhylo.run(cna_dir = fp.__path__[0] + '/data/exampleCNA.txt',
output='fitPhylo_out',
prefix='example_',
resolution=1,
clone_thr=1,
n_neighbors=5,
plot_png=False,
verbose=True
)
# 2.score
fp.fitPhylo.chromosome_event(
'fitPhylo_out',
prefix='example_',
cancer_type='ALL',
cores=8,
randome_num=1000,
verbose=True)
Output files
1.cell_info.txt: Cell variation information in trace.
name Root_gain_loc Root_loss_loc Root_gain_cn Root_loss_cn Parent_gain_loc Parent_loss_loc Parent_gain_cn Parent_loss_cn Mitosis_copy Mitosis_dd_loc Mitosis_ad_loc Mitosis_time Pseudotime_tree Mitosis_time_next aneu_rate copy_rate status
root 1042.0 163.0 20.0 0.0 40.0 0.013 Aneuploidy
cell_2 160.0 428.0 428.0 160.0 160.0 428.0 160.0 428.0 11617.0 825.0 478.0 31.0 21.0 21.0 0.039 0.951 Aneuploidy
cell_3 151.0 629.0 629.0 151.0 151.0 629.0 151.0 629.0 11425.0 1801.0 182.0 32.5 19.0 19.0 0.014 0.936 Aneuploidy
cell_4 51.0 360.0 360.0 51.0 291.0 410.0 291.0 410.0 11504.0 779.0 189.0 25.0 46.0 25.0 0.015 0.942 Aneuploidy
[Root|Parent]_[gain|loss]_[loc|cn]: The number of sites or copies accumulated (gain|loss) relative to the (Root|Parent) node.
Mitosis_copy: The count of genomic segments sharing the same copy number states between the current node and its parent node(D_ss).
Mitosis_dd_loc: The count of genomic segments different copy number states between the current node and its parent node(D_ds).
Mitosis_ad_loc: The count of aneuploidy segregation states between the current node and its parent node(D_as).
Mitosis_time: Branch length of current node.
Pseudotime_tree: Pseudotime of current node in tree.
Mitosis_time_next: The branch length of the next mitosis of the current node.
aneu_rate: Rate of aneuploidy segregation states.
copy_rate: Rate of same copy number states.
status: The current cell mitotic state inferred based on aneu_rate and copy_rate.
2.all_node_data.txt: Cell CNA profile, including internal node, name by "virtual_".
1_977836_977836 1_1200863_1200863 ...
cell1 1.0 1.0 ...
cell2 2.0 2.0 ...
Integer copy number profile of all nodes in tree. Rows are cells, columns are genome segments.
3.cell_tree.newick: Single cell trace file,format newick.
Stores the structural information of the evolutionary tree, including branch length.
4.*re_score.txt
,1,2,3,...
cell_1,0.761,0.800,0.793,...
cell_2,0.88,0.0,0.636,0.659,...
cell_3,0.88,0.957,0.783,...
The level of chromosomal rearrangements. Rows are cells, columns are chromosome. If the value is -1, it means that the chromosome has not changed significantly.
5.*gradual_score.txt
,1,2,3,...
cell_1,0.001,0.008,0.705,...
cell_2,0.780,0.005,0.837,...
cell_3,0.890,0.907,0.463,...
The level of chromosomal gradual. Rows are cells, columns are chromosome.
6.*re_socre_pvalue.txt
P-value of chromosomal rearrangements (detail in methods).
7.*mode.txt
,chr_num,wgd,gradual_num,seismic_num,gradual_score,seismic_score,BFB
cell_1,19,WGD1,14,5,0.009,0.0149,1368
cell_2,11,WGD1,7,4,0.001,0.006,234
cell_3,12,WGD0,6,6,0.002,0.001,209
chr_num: The total number of chromosomes in gradual and seismic.
wgd: Cell chromosome WGD type (wgd0 is diploid, wgd1 involves a single whole genome duplication, and wgd2 entails multiple whole genome duplications.).
gradual_num: The total number of chromosomes in gradual.
seismic_num: The total number of chromosomes in seismic.
gradual_score: Average gradual score in gradual chromosome.
seismic_score: Average seismic score in seismic chromosome.
BFB: The number of aneuploidy segregation states.
8.*cee_score.txt
cell_1 0.24
cell_2 0.19
cell_3 0.87
first column: Cell id.
second column: CEE score.
9.*tree.png
Optional parameter '-d' or 'plot_png'.
If set to 1, it will draw the phylogenetic tree and heatmap of CNA profile.
If set to 0, it will not be drawn.
Note that drawing requires
matplotlib
andseaborn
packages
Developer
Fang Wang (fwang9@mdanderson.org), Xin Wang (wangx768@mail2.sysu.edu.cn)
Draft date
Oct.12, 2023
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fitphylo-1.0.tar.gz
.
File metadata
- Download URL: fitphylo-1.0.tar.gz
- Upload date:
- Size: 57.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e44cc427d892109f6178eda7ddd89017c74638b2a5f5d53a5a429ec58ee6f39d |
|
MD5 | 49d441bac7707af72dd5825203abacce |
|
BLAKE2b-256 | 4d268af43244455d185b9ae350fcf529d678abc4de225864fec458e63d861c81 |
File details
Details for the file fitphylo-1.0-cp311-cp311-macosx_11_0_x86_64.whl
.
File metadata
- Download URL: fitphylo-1.0-cp311-cp311-macosx_11_0_x86_64.whl
- Upload date:
- Size: 257.1 kB
- Tags: CPython 3.11, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2714c8e671265bfbbc226322381adedc729c87144206c28c5a3513e2299baed6 |
|
MD5 | 90babfac22dc987c560aac47d0b24318 |
|
BLAKE2b-256 | 31a75c1297cc88cfc44684ed8d6a2656ef22325e0a299296e92e8daea0e6b53f |