Chen and Yang Lab Multi fork Development cell lineage tree alignment
Project description
MODELTA
-
Chen and Yang Lab Multi fork Development cell lineage tree alignment algorithm package and executable program.
-
You can get the score matrix through them to analyze the node relationship of the pedigree tree, or you can test the correlation.
-
You can star this repository to keep track of the project if it's helpful for you, thank you for your support.
Install
Required package
- pandas: Score matrix architecture based on dataframe.
- numpy: Many computing essential packages.
- munkres: An algorithm for finding the maximum value of score matrix dynamic programming
Optional package
- tqdm: Displays the progress during the calculation phase.
- multiprocess: When calculating the p value, because it needs to disrupt the original sequence many times and perform multiple calculations, using multiple processes can effectively reduce the waiting time.
Pip install
run "pip install modelta"
Source code install
(1) Offline
Step1: git clone https://github.com/Chenjy0212/modelta.git
Step2: cd modelta -> run "python setup.py install"
(2) Online
run "pip install git+https://github.com/Chenjy0212/modelta.git@main"
!!! If you are a python coding user:
Quick Start
You can use this package in your Python code. For example, run under Jupiter notebook:
import modelta
from pprint import pprint
example = modelta.scoremat(TreeSeqFile = 'ExampleFile/tree.nwk',
TreeSeqFile2 = 'ExampleFile/tree.nwk',
Name2TypeFile = 'ExampleFile/Name2Type.csv',
Name2TypeFile2 ='ExampleFile/Name2Type.csv',
ScoreDictFile = ''
mv = 2,
top = 0,
notebook = True,
pv = -1,
Tqdm = True,)
pprint(example)
Result
Matrix Node: 100%
121/121 [00:00<00:00, 2573.11it/s]
{'TopScoreList': [{'Root1_label': 'root',
'Root1_node': '(((a,b,c),d,(e,f)),a)',
'Root2_label': 'root',
'Root2_node': '(((a,b,c),d,(e,f)),a)',
'Score': 14.0,
'col': 10,
'row': 10},
{'Root1_label': '0',
'Root1_node': '((a,b,c),d,(e,f))',
'Root2_label': 'root',
'Root2_node': '(((a,b,c),d,(e,f)),a)',
'Score': 11.0,
'col': 10,
'row': 9},
{'Root1_label': 'root',
'Root1_node': '(((a,b,c),d,(e,f)),a)',
'Root2_label': '0',
'Root2_node': '((a,b,c),d,(e,f))',
'Score': 11.0,
'col': 9,
'row': 10}],
'matrix': Root2 0,0,0 0,0,1 0,0,2 0,1 0,2,0 0,2,1 1 0,0 0,2 0 root
Root1
0,0,0 2.0 -1.0 -1.0 -1.0 -1.0 -1.0 2.0 0.0 -1.0 -1.0 -1.0
0,0,1 -1.0 2.0 -1.0 -1.0 -1.0 -1.0 -1.0 0.0 -1.0 -1.0 -1.0
0,0,2 -1.0 -1.0 2.0 -1.0 -1.0 -1.0 -1.0 0.0 -1.0 -1.0 -1.0
0,1 -1.0 -1.0 -1.0 2.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0
0,2,0 -1.0 -1.0 -1.0 -1.0 2.0 -1.0 -1.0 -1.0 1.0 -1.0 -1.0
0,2,1 -1.0 -1.0 -1.0 -1.0 -1.0 2.0 -1.0 -1.0 1.0 -1.0 -1.0
1 2.0 -1.0 -1.0 -1.0 -1.0 -1.0 2.0 0.0 -1.0 -1.0 -1.0
0,0 0.0 0.0 0.0 -1.0 -1.0 -1.0 0.0 6.0 -2.0 3.0 2.0
0,2 -1.0 -1.0 -1.0 -1.0 1.0 1.0 -1.0 -2.0 4.0 0.0 -1.0
0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 3.0 0.0 12.0 11.0
root -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 2.0 -1.0 11.0 14.0}
Parameter analysis
If the parameter has an *
, it is required; otherwise, it is optional
TreeSeqFile
&TreeSeqFile2
: [path/filename*
] Cell lineage tree file with branch length information removed. The format of reference documents is as follows: ExampleFile/tree.nwkmv
: [float anddefault
= 2.] The matching score between the same nodes, which is often used when the parameterScoreDictFile
is the default.pv
: [float anddefault
= -1.] The prune score between the different nodes.top
: [int > 0 anddefault
= 0] Select the top few meaningful scores in the score matrix. if it is default:
{'T1root_T2root': [{'Root1_label': 'root',
'Root1_node': '(((a,b,c),d,(e,f)),a)',
'Root2_label': 'root',
'Root2_node': '(((a,b,c),d,(e,f)),a)',
'Score': 14.0,
'col': 10,
'row': 10}],
notebook
: [bool anddefault
=False] Is it written and run in the jupyter notebook environment.Tqdm
: [bool anddefault
=True] Whether to display the operation progress bar.
if Qualitative calculation:
Name2TypeFile
&Name2TypeFile2
: [path/filename*
] Convert tree node name to type. The format of reference documents is as follows: ExampleFile/Name2Type.csvScoreDictFile
: [path/filename anddefault
=''] Defines the score of matches between nodes. The format of reference documents is as follows: ExampleFile/socrefile.csv
If Quantitative calculation
ScoreDictFile
: [path/filename*
] Defines the score of matches between nodes. The format of reference documents is as follows: ExampleFile/Qscorefile.csvName2TypeFile
&Name2TypeFile2
: [path/filename or No input] Convert tree node name to type. The format of reference documents is as follows: ExampleFile/Name2Type.csv
P-value calculation
modelta.pvalue(times = 3,
topscorelist = example['TopScoreList'],
ScoreDictFile='',
CPUs = 50,
mv = 2,
pv = -1)
Result
Pvalue : 100%|██████████| 3/3 [00:00<00:00, 4.05it/s]
Pvalue : 100%|██████████| 3/3 [00:00<00:00, 4.38it/s]
Pvalue : 100%|██████████| 3/3 [00:00<00:00, 4.45it/s]
[[3.0, 4.0, 0.0, 14.0], [4.0, 5.0, 3.0, 11.0], [5.0, 0.0, 1.0, 11.0]]
The returned results represent times
matching scores corresponding to the top
maximum values
Parameter analysis
If the parameter has an *
, it is required; otherwise, it is optional
times
: [int > 0*
] The number of times the original sequence needs to be disrupted, such as:
times = 3 #Randomly disrupt the nodes, but the structure remains unchanged
(((a,b,c),d,(e,f)),a) -> (((a,b,c),d,(e,f)),a)
-> (((a,c,d),b,(a,f)),e)
-> (((e,f,a),d,(b,c)),a)
topscorelist
: [example['TopScoreList']*
] The input parameter is the maximum value sequence obtained earlier.CPUs
: [int > 0 anddefault
= 50] Multi process computing can greatly reduce the waiting time. The default process pool is 50, but limited by local computer resources, it can reach the maximum number of local CPU cores - 1.mv
&pv
¬ebook
parameters have been described in detail before
!!! Elif you're not a regular programmer:
Quick Start
We provide executable files, which can be obtained by inputting corresponding parameters at the terminal
Citation
If you use this project in your research, please cite this project.
@misc{modelta2022,
author = {Jingyu Chen},
title = {MODELTA: Multi fork Development cell lineage tree alignment},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Chenjy0212/modelta}},
}
Introduction
Student of @SYSU. :school:
Undergraduate majoring in computer science, master majoring in bioinformatics. :man_technologist:
I hope my program can be helpful to your research. :heart:
How to contact the author has been written at the top. :eyes:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.