Chen and Yang Lab Multi fork Development cell lineage tree alignment

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

wechat

mDELTA

Yang Lab Multifuricating Developmental cEll Lineage Tree Alignment (mDELTA) algorithm package and executable program.
You can get the score matrix through them to analyze the node relationship of the pedigree tree, or test the correlation.
You can star this repository to keep track of the project if it's helpful for you, thank you for your support.

Install

Required package

pandas: Score matrix architecture based on dataframe.
numpy: Many computing essential packages.
munkres: An algorithm for finding the maximum value of score matrix dynamic programming

Optional package

tqdm: Displays the progress during the calculation phase.
multiprocess: When calculating the p value, because it needs to disrupt the original sequence many times and perform multiple calculations, using multiple processes can effectively reduce the waiting time.

Pip install

$"pip install modelta"

Source code install

(1) Offline
Step1: $git clone https://github.com/Chenjy0212/modelta.git
Step2: $cd modelta -> run "python setup.py install"
(2) Online
$pip install git+https://github.com/Chenjy0212/modelta.git@main

For python coder user ↓

Quick Start

You can use this package in your Python code. For example, run under Jupiter notebook:

import modelta
from pprint import pprint

example = modelta.scoremat(TreeSeqFile = 'ExampleFile/tree.nwk',
                       TreeSeqFile2 = 'ExampleFile/tree.nwk',
                       Name2TypeFile = 'ExampleFile/Name2Type.csv',
                       Name2TypeFile2 ='ExampleFile/Name2Type.csv',
                       top = 3,
                       notebook = 1,
                       overlap = 5,)
pprint(example)

Result

Matrix Node: |██████████| 121/121 100%
121/121 [00:00<00:00, 2573.11it/s]
{'TopScoreList': [{'Root1_label': 'root',
                   'Root1_match': ['0',
                                   '1',
                                   '0,0',
                                   '0,1',
                                   '0,2',
                                   '0,0,0',
                                   '0,0,1',
                                   '0,0,2',
                                   '0,2,0',
                                   '0,2,1'],
                   'Root1_node': '(((a,b,c),d,(e,f)),a)',
                   'Root1_prune': [],
                   'Root1_seq': '(((a1,a2,a3),a4,(a5,a6)),a1)',
                   'Root2_label': 'root',
                   'Root2_match': ['0',
                                   '1',
                                   '0,0',
                                   '0,1',
                                   '0,2',
                                   '0,0,0',
                                   '0,0,1',
                                   '0,0,2',
                                   '0,2,0',
                                   '0,2,1'],
                   'Root2_node': '(((a,b,c),d,(e,f)),a)',
                   'Root2_prune': [],
                   'Root2_seq': '(((a1,a2,a3),a4,(a5,a6)),a1)',
                   'Score': 14.0,
                   'col': 10,
                   'row': 10},
                  {'Root1_label': '0',
                   'Root1_match': ['0',
                                   '0,0',
                                   '0,1',
                                   '0,2',
                                   '0,0,0',
                                   '0,0,1',
                                   '0,0,2',
                                   '0,2,0',
                                   '0,2,1'],
                   'Root1_node': '((a,b,c),d,(e,f))',
                   'Root1_prune': ['1'],
                   'Root1_seq': '((a1,a2,a3),a4,(a5,a6))',
                   'Root2_label': 'root',
                   'Root2_match': ['0',
                                   '0,0',
                                   '0,1',
                                   '0,2',
                                   '0,0,0',
                                   '0,0,1',
                                   '0,0,2',
                                   '0,2,0',
                                   '0,2,1'],
                   'Root2_node': '(((a,b,c),d,(e,f)),a)',
                   'Root2_prune': ['1'],
                   'Root2_seq': '(((a1,a2,a3),a4,(a5,a6)),a1)',
                   'Score': 11.0,
                   'col': 10,
                   'row': 9},
                  {'Root1_label': '0,0',
                   'Root1_match': ['0,0', '0,0,0', '0,0,1', '0,0,2'],
                   'Root1_node': '(a,b,c)',
                   'Root1_prune': ['0,1', '0,2,0', '0,2,1'],
                   'Root1_seq': '(a1,a2,a3)',
                   'Root2_label': '0',
                   'Root2_match': ['0,0', '0,0,0', '0,0,1', '0,0,2'],
                   'Root2_node': '((a,b,c),d,(e,f))',
                   'Root2_prune': ['0,1', '0,2,0', '0,2,1', '1'],
                   'Root2_seq': '((a1,a2,a3),a4,(a5,a6))',
                   'Score': 3.0,
                   'col': 9,
                   'row': 7}],
 'matrix': Root2  0,0,0  0,0,1  0,0,2  0,1  0,2,0  0,2,1    1  0,0  0,2     0  root
Root1                                                                   
0,0,0    2.0   -1.0   -1.0 -1.0   -1.0   -1.0  2.0  0.0 -1.0  -1.0  -1.0
0,0,1   -1.0    2.0   -1.0 -1.0   -1.0   -1.0 -1.0  0.0 -1.0  -1.0  -1.0
0,0,2   -1.0   -1.0    2.0 -1.0   -1.0   -1.0 -1.0  0.0 -1.0  -1.0  -1.0
0,1     -1.0   -1.0   -1.0  2.0   -1.0   -1.0 -1.0 -1.0 -1.0  -1.0  -1.0
0,2,0   -1.0   -1.0   -1.0 -1.0    2.0   -1.0 -1.0 -1.0  1.0  -1.0  -1.0
0,2,1   -1.0   -1.0   -1.0 -1.0   -1.0    2.0 -1.0 -1.0  1.0  -1.0  -1.0
1        2.0   -1.0   -1.0 -1.0   -1.0   -1.0  2.0  0.0 -1.0  -1.0  -1.0
0,0      0.0    0.0    0.0 -1.0   -1.0   -1.0  0.0  6.0 -2.0   3.0   2.0
0,2     -1.0   -1.0   -1.0 -1.0    1.0    1.0 -1.0 -2.0  4.0   0.0  -1.0
0       -1.0   -1.0   -1.0 -1.0   -1.0   -1.0 -1.0  3.0  0.0  12.0  11.0
root    -1.0   -1.0   -1.0 -1.0   -1.0   -1.0 -1.0  2.0 -1.0  11.0  14.0}

Parameter analysis

If the parameter has an *, it is required; otherwise, it is optional

TreeSeqFile & TreeSeqFile2: [path/filename *] Cell lineage tree file with branch length information removed. The format of reference documents is as follows: ExampleFile/tree.nwk
mv: [float and default = 2.] The matching score between the same nodes, which is often used when the parameter ScoreDictFile is the default.
pv: [float and default = -1.] The prune score between the different nodes.
top: [int > 0 and default = 0] Select the top few meaningful scores in the score matrix. if it is default:

{'T1root_T2root': [{'Root1_label': 'root',
                    'Root1_match': ['0',
                                    '1',
                                    '0,0',
                                    '0,1',
                                    '0,2',
                                    '0,0,0',
                                    '0,0,1',
                                    '0,0,2',
                                    '0,2,0',
                                    '0,2,1'],
                    'Root1_node': '(((a,b,c),d,(e,f)),a)',
                    'Root1_prune': [],
                    'Root1_seq': '(((a1,a2,a3),a4,(a5,a6)),a1)',
                    'Root2_label': 'root',
                    'Root2_match': ['0',
                                    '1',
                                    '0,0',
                                    '0,1',
                                    '0,2',
                                    '0,0,0',
                                    '0,0,1',
                                    '0,0,2',
                                    '0,2,0',
                                    '0,2,1'],
                    'Root2_node': '(((a,b,c),d,(e,f)),a)',
                    'Root2_prune': [],
                    'Root2_seq': '(((a1,a2,a3),a4,(a5,a6)),a1)',
                    'Score': 14.0,
                    'col': 10,
                    'row': 10}],
                    
                    ......

}

notebook: [bool and default=False] Is it written and run in the jupyter notebook environment.
Tqdm: [bool and default=True] Whether to display the operation progress bar.
overlap: [int > 0 and default = 0] In the local results, the later comparison results cannot have X% or more node pairs that duplicate the previous results.

if Qualitative calculation:

Name2TypeFile & Name2TypeFile2: [path/filename *] Convert tree node name to type. The format of reference documents is as follows: ExampleFile/Name2Type.csv
ScoreDictFile: [path/filename and default=''] Defines the score of matches between nodes. The format of reference documents is as follows: ExampleFile/socrefile.csv

The matching score between nodes is determined according to the "ScoreDictFile" file.
If the file is empty, only the same nodes are taken for pairing, and the default matching score is 2 (float)

node: a <-> a = 2.(custom)
      b <-> b = 3.(custom)
      a <-> b = ?(custom)
The higher the score, the stronger the similarity

If Quantitative calculation

ScoreDictFile: [path/filename *] Defines the score of matches between nodes. The format of reference documents is as follows: ExampleFile/Qscorefile.csv
Name2TypeFile & Name2TypeFile2: [path/filename or No input] Convert tree node name to type. The format of reference documents is as follows: ExampleFile/Name2Type.csv

The matching score between nodes is determined according to the "ScoreDictFile" file.
The file is required. You can modify the score of the same node by modifying parameter "mv"

   Gene0  Gene1  Gene2  
a    1      2      3  
b    2      3      4

node: (1-2)**2 + (2-3)**2 + (3-4)**2 #Euclidean distance
Then get the final score according to the smoothing function. 
The lower the score, the stronger the similarity

P-value calculation

modelta.pvalue(times = 3, 
               topscorelist = example['TopScoreList'], 
               ScoreDictFile='',
               CPUs = 50, 
               mv = 2, 
               pv = -1)

Result

 Pvalue : 100%|██████████| 3/3 [00:00<00:00,  4.05it/s]
 Pvalue : 100%|██████████| 3/3 [00:00<00:00,  4.38it/s]
 Pvalue : 100%|██████████| 3/3 [00:00<00:00,  4.45it/s]
[[3.0, 4.0, 0.0, 14.0], [4.0, 5.0, 3.0, 11.0], [5.0, 0.0, 1.0, 11.0]]

The returned results represent times matching scores corresponding to the top maximum values

Parameter analysis

If the parameter has an *, it is required; otherwise, it is optional

times: [int > 0 *] The number of times the original sequence needs to be disrupted, such as:

times = 3 #Randomly disrupt the nodes, but the structure remains unchanged
(((a,b,c),d,(e,f)),a) -> (((a,b,c),d,(e,f)),a)
                      -> (((a,c,d),b,(a,f)),e)
                      -> (((e,f,a),d,(b,c)),a)

topscorelist: [example['TopScoreList'] *] The input parameter is the maximum value sequence obtained earlier.
CPUs: [int > 0 and default = 50] Multi process computing can greatly reduce the waiting time. The default process pool is 50, but limited by local computer resources, it can reach the maximum number of local CPU cores - 1.
mv & pv & notebook & Tqdm & overlap parameters have been described in detail before

For Ordinary user ↓

Quick Start

We provide executable files, which can be obtained by inputting corresponding parameters at the terminal. Download executable files in different operating environments [Windows] / [Linux]

Windows

mDELTA.exe ./ExampleFile/tree.nwk ./ExampleFile/tree.nwk -t 3

Linux

./mDELTA ../ExampleFile/tree.nwk ../ExampleFile/tree.nwk -t 3

Help

Windows: $mDELTA.exe -h
 Linux:  $./mDELTA -h

usage: MODELTA [-h] [-nt NAME2TYPEFILE] [-nt2 NAME2TYPEFILE2] [-sd SCOREDICTFILE] [-t TOP] [-m MV] [-p PV] [-T TQDM] [-n NOTEBOOK]
               [-P PVALUE] [-a ALG] [-c CPUS]
               TreeSeqFile TreeSeqFile2

Multi fork Development cell lineage tree alignment

positional arguments:
  TreeSeqFile           [path/filename] Cell lineage tree file with branch length information removed.
  TreeSeqFile2          [path/filename] Cell lineage tree file with branch length information removed.

optional arguments:
  -h, --help            show this help message and exit
  -nt NAME2TYPEFILE, --Name2TypeFile NAME2TYPEFILE
                        [path/filename] Convert tree node name to type.
  -nt2 NAME2TYPEFILE2, --Name2TypeFile2 NAME2TYPEFILE2
                        [path/filename] Convert tree node name to type.
  -sd SCOREDICTFILE, --ScoreDictFile SCOREDICTFILE
                        [path/filename] Defines the score of matches between types.
  -t TOP, --top TOP     [int > 0] Select the top few meaningful scores in the score matrix.
  -m MV, --mv MV        [float] The matching score between the same nodes.
  -p PV, --pv PV        [float] The prune score between the different nodes.
  -T TQDM, --Tqdm TQDM  [0(off) or 1(on)] Whether to display the operation progress bar.
  -n NOTEBOOK, --notebook NOTEBOOK
                        [0(off) or 1(on)] Is it written and run in the jupyter notebook environment.
  -P PVALUE, --Pvalue PVALUE
                        [int > 0] The number of times the original sequence needs to be disrupted.
  -a ALG, --Alg ALG     [KM / GA] Represent KM algorithm and GA algorithm respectively to find the maximum value of each node of
                        the score matrix
  -c CPUS, --CPUs CPUS  [int > 0] Multi process computing can greatly reduce the waiting time. The default process pool is 50, but
                        limited by local computer resources, it can reach the maximum number of local CPU cores - 1.
  -x overlap, --overlap overlap
                        [int > 0] 


Developer: Yang Lab(https://www.labxing.com/profile/10413), Details: https://github.com/Chenjy0212/modelta

Citation

If you use this project in your research, please cite this project.

@misc{modelta2022,
    author = {Jingyu Chen},
    title = {mDELTA: Multifuricating Developmental cEll Lineage Tree Alignment},
    year = {2022},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/Chenjy0212/modelta}},
}

Introduction

Student of @SYSU. :school:

Undergraduate majoring in computer science, master majoring in bioinformatics. :man_technologist:

I hope my program can be helpful to your research. :heart:

How to contact the author has been written at the top. :eyes:

Update

2022-05-25

Add internal node correspondence, output results: Root_ match and Root_ prune

Add a new parameter -x & --overlap. For example, if the value is x%, in the local result, the later comparison result cannot have x% or more node pairs that duplicate the previous result.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.30

Aug 1, 2022

1.0.28

Jul 28, 2022

1.0.27

Jul 27, 2022

1.0.25

Jul 25, 2022

1.0.24

Jul 25, 2022

1.0.23

Jul 25, 2022

1.0.22

Jul 25, 2022

1.0.21

Jul 25, 2022

1.0.20

Jul 25, 2022

1.0.19

Jul 25, 2022

1.0.18

Jul 3, 2022

1.0.17

Jul 3, 2022

1.0.16

Jul 3, 2022

1.0.15

Jul 3, 2022

1.0.14

Jul 3, 2022

1.0.13

Jul 3, 2022

1.0.12

Jul 3, 2022

1.0.11

Jul 3, 2022

1.0.10

Jun 18, 2022

1.0.9

Jun 18, 2022

1.0.8

Jun 18, 2022

1.0.7

Jun 18, 2022

1.0.6

Jun 18, 2022

1.0.5

Jun 18, 2022

1.0.4

Jun 17, 2022

This version

1.0.3

Jun 17, 2022

1.0.2

May 24, 2022

1.0.1

May 18, 2022

1.0.0

May 6, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelta-1.0.3.tar.gz (39.7 kB view hashes)

Uploaded Jun 17, 2022 Source

Built Distribution

modelta-1.0.3-py3-none-any.whl (31.9 kB view hashes)

Uploaded Jun 17, 2022 Python 3

Hashes for modelta-1.0.3.tar.gz

Hashes for modelta-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`1e67cebaeddd71a2aaca7cc8837a875dd48954cc67beeee6ba37006ae2ab88b5`
MD5	`d5b2377d8926546859e66c87b68c5ea5`
BLAKE2b-256	`8a6b3e43af50942f0e70bddcf57c20d26e584bf84fe31203bc211d492c9a8db1`

Hashes for modelta-1.0.3-py3-none-any.whl

Hashes for modelta-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`551e4fa13cd6e982f9862ae8e1bd12a0c701c42cdabbe04da37600733c7b49af`
MD5	`7cb7895200dbc21f73ac960039f0cbe5`
BLAKE2b-256	`fb5a96c3e7c76b3a04fc61dfe7d3ea40bbc86c55957455ad23b18727c63d3b00`

modelta 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

mDELTA

Install

Required package

Optional package

Pip install

Source code install

For python coder user ↓

Quick Start

Result

Parameter analysis

P-value calculation

Result

Parameter analysis

For Ordinary user ↓

Quick Start

Windows

Linux

Help

Citation

Introduction

Update

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution