Skip to main content

Chen and Yang Lab Multi fork Development cell lineage tree alignment

Project description

logo

Documentation Status PyPI - Python Version PyPI license
Github Stars Bilibili Zhihu Weibo neteasy-mysic douyin instagram
QQ wechat mail gmail
sysu

MODELTA

  • Chen and Yang Lab Multi fork Development cell lineage tree alignment algorithm package and executable program.

  • You can get the score matrix through them to analyze the node relationship of the pedigree tree, or you can test the correlation.

  • You can star this repository to keep track of the project if it's helpful for you, thank you for your support.

Install

Required package

  • pandas: Score matrix architecture based on dataframe.
  • numpy: Many computing essential packages.
  • munkres: An algorithm for finding the maximum value of score matrix dynamic programming

Optional package

  • tqdm: Displays the progress during the calculation phase.
  • multiprocess: When calculating the p value, because it needs to disrupt the original sequence many times and perform multiple calculations, using multiple processes can effectively reduce the waiting time.

Pip install

run "pip install modelta"

Source code install

(1) Offline
Step1: git clone https://github.com/Chenjy0212/modelta.git
Step2: cd modelta -> run "python setup.py install"
(2) Online
run "pip install git+https://github.com/Chenjy0212/modelta.git@main"

!!! If you are a python coding user:

Quick Start

You can use this package in your Python code. For example, run under Jupiter notebook:

import modelta
from pprint import pprint

example = modelta.scoremat(TreeSeqFile = 'ExampleFile/tree.nwk',
                       TreeSeqFile2 = 'ExampleFile/tree.nwk',
                       Name2TypeFile = 'ExampleFile/Name2Type.csv',
                       Name2TypeFile2 ='ExampleFile/Name2Type.csv',
                       ScoreDictFile = ''
                       mv = 2,
                       top = 0,
                       notebook = True,
                       pv = -1,
                       Tqdm = True,)
pprint(example)

Result

Matrix Node: 100%
121/121 [00:00<00:00, 2573.11it/s]

{'TopScoreList': [{'Root1_label': 'root',
                   'Root1_node': '(((a,b,c),d,(e,f)),a)',
                   'Root2_label': 'root',
                   'Root2_node': '(((a,b,c),d,(e,f)),a)',
                   'Score': 14.0,
                   'col': 10,
                   'row': 10},
                  {'Root1_label': '0',
                   'Root1_node': '((a,b,c),d,(e,f))',
                   'Root2_label': 'root',
                   'Root2_node': '(((a,b,c),d,(e,f)),a)',
                   'Score': 11.0,
                   'col': 10,
                   'row': 9},
                  {'Root1_label': 'root',
                   'Root1_node': '(((a,b,c),d,(e,f)),a)',
                   'Root2_label': '0',
                   'Root2_node': '((a,b,c),d,(e,f))',
                   'Score': 11.0,
                   'col': 9,
                   'row': 10}],
 'matrix': Root2  0,0,0  0,0,1  0,0,2  0,1  0,2,0  0,2,1    1  0,0  0,2     0  root
Root1                                                                   
0,0,0    2.0   -1.0   -1.0 -1.0   -1.0   -1.0  2.0  0.0 -1.0  -1.0  -1.0
0,0,1   -1.0    2.0   -1.0 -1.0   -1.0   -1.0 -1.0  0.0 -1.0  -1.0  -1.0
0,0,2   -1.0   -1.0    2.0 -1.0   -1.0   -1.0 -1.0  0.0 -1.0  -1.0  -1.0
0,1     -1.0   -1.0   -1.0  2.0   -1.0   -1.0 -1.0 -1.0 -1.0  -1.0  -1.0
0,2,0   -1.0   -1.0   -1.0 -1.0    2.0   -1.0 -1.0 -1.0  1.0  -1.0  -1.0
0,2,1   -1.0   -1.0   -1.0 -1.0   -1.0    2.0 -1.0 -1.0  1.0  -1.0  -1.0
1        2.0   -1.0   -1.0 -1.0   -1.0   -1.0  2.0  0.0 -1.0  -1.0  -1.0
0,0      0.0    0.0    0.0 -1.0   -1.0   -1.0  0.0  6.0 -2.0   3.0   2.0
0,2     -1.0   -1.0   -1.0 -1.0    1.0    1.0 -1.0 -2.0  4.0   0.0  -1.0
0       -1.0   -1.0   -1.0 -1.0   -1.0   -1.0 -1.0  3.0  0.0  12.0  11.0
root    -1.0   -1.0   -1.0 -1.0   -1.0   -1.0 -1.0  2.0 -1.0  11.0  14.0}

Parameter analysis

If the parameter has an *, it is required; otherwise, it is optional

  • TreeSeqFile & TreeSeqFile2: [path/filename *] Cell lineage tree file with branch length information removed. The format of reference documents is as follows: ExampleFile/tree.nwk
  • mv: [float and default = 2.] The matching score between the same nodes, which is often used when the parameter ScoreDictFile is the default.
  • pv: [float and default = -1.] The prune score between the different nodes.
  • top: [int > 0 and default = 0] Select the top few meaningful scores in the score matrix. if it is default:
{'T1root_T2root': [{'Root1_label': 'root',
                    'Root1_node': '(((a,b,c),d,(e,f)),a)',
                    'Root2_label': 'root',
                    'Root2_node': '(((a,b,c),d,(e,f)),a)',
                    'Score': 14.0,
                    'col': 10,
                    'row': 10}],                                                    
  • notebook: [bool and default=False] Is it written and run in the jupyter notebook environment.
  • Tqdm: [bool and default=True] Whether to display the operation progress bar.

if Qualitative calculation:

  • Name2TypeFile & Name2TypeFile2: [path/filename *] Convert tree node name to type. The format of reference documents is as follows: ExampleFile/Name2Type.csv
  • ScoreDictFile: [path/filename and default=''] Defines the score of matches between nodes. The format of reference documents is as follows: ExampleFile/socrefile.csv

If Quantitative calculation

  • ScoreDictFile: [path/filename *] Defines the score of matches between nodes. The format of reference documents is as follows: ExampleFile/Qscorefile.csv
  • Name2TypeFile & Name2TypeFile2: [path/filename or No input] Convert tree node name to type. The format of reference documents is as follows: ExampleFile/Name2Type.csv

P-value calculation

modelta.pvalue(times = 3, 
               topscorelist = example['TopScoreList'], 
               ScoreDictFile='',
               CPUs = 50, 
               mv = 2, 
               pv = -1)

Result

 Pvalue : 100%|██████████| 3/3 [00:00<00:00,  4.05it/s]
 Pvalue : 100%|██████████| 3/3 [00:00<00:00,  4.38it/s]
 Pvalue : 100%|██████████| 3/3 [00:00<00:00,  4.45it/s]
[[3.0, 4.0, 0.0, 14.0], [4.0, 5.0, 3.0, 11.0], [5.0, 0.0, 1.0, 11.0]]

The returned results represent times matching scores corresponding to the top maximum values

Parameter analysis

If the parameter has an *, it is required; otherwise, it is optional

  • times: [int > 0 *] The number of times the original sequence needs to be disrupted, such as:
times = 3 #Randomly disrupt the nodes, but the structure remains unchanged
(((a,b,c),d,(e,f)),a) -> (((a,b,c),d,(e,f)),a)
                      -> (((a,c,d),b,(a,f)),e)
                      -> (((e,f,a),d,(b,c)),a)
  • topscorelist: [example['TopScoreList'] *] The input parameter is the maximum value sequence obtained earlier.
  • CPUs: [int > 0 and default = 50] Multi process computing can greatly reduce the waiting time. The default process pool is 50, but limited by local computer resources, it can reach the maximum number of local CPU cores - 1.
  • mv & pv & notebook parameters have been described in detail before

!!! Elif you're not a regular programmer:

Quick Start

We provide executable files, which can be obtained by inputting corresponding parameters at the terminal


Citation

If you use this project in your research, please cite this project.

@misc{modelta2022,
    author = {Jingyu Chen},
    title = {MODELTA: Multi fork Development cell lineage tree alignment},
    year = {2022},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/Chenjy0212/modelta}},
}

Introduction

Student of @SYSU. :school:

Undergraduate majoring in computer science, master majoring in bioinformatics. :man_technologist:

I hope my program can be helpful to your research. :heart:

How to contact the author has been written at the top. :eyes:

sysulogo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelta-1.0.0.tar.gz (31.9 kB view hashes)

Uploaded Source

Built Distribution

modelta-1.0.0-py3-none-any.whl (24.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page