This package has functions for the conversion of amino acid sequences to physicochemical vectors and the subsequent analysis of those vector sequences.
Project description
PCDTW is a package that implements the conversion of amino acid sequences to physicochemical vectors and subsequently allows for alignment of the sequences based on those vectors, development of consensus vectors that can be used to search databases for similar physicochemical profiles, development of the DTW distance between two physicochemical vectors and a few other functions. The basis for this package can be found in three publications and should be consulted for further background [1–3].
To install PCDTW (Two Options): -Use ‘pip install PCDTW’ in a powershell prompt -Use ‘!pip install PCDTW’ in a jupyter notebook
To use PCDTW: Use ‘import PCDTW’
Citations
1)Dixson, J.D.; Vumma, L.; Azad, R.K. An Analysis of Combined Molecular Weight and Hydrophobicity Similarity between the Amino Acid Sequences of Spike Protein Receptor Binding Domains of Betacoronaviruses and Functionally Similar Sequences from Other Virus Families. Microorganisms 2024, 12.
2)Dixson, J.D.; Azad, R.K. Physicochemical Evaluation of Remote Homology in the Twilight Zone. Proteins Struct. Funct. Bioinforma. 2024, n/a, doi:https://doi.org/10.1002/prot.26742.
3)Dixson, J.D.; Azad, R.K. A Novel Predictor of ACE2-Binding Ability among Betacoronaviruses. Evol. Med. Public Heal. 2021, 9, 360–373, doi:10.1093/EMPH/EOAB032.
Usage:
-
To convert an amino acid sequence to vector form using two physicochemical properties:
PCDTW.PCDTWConvert(x, PCProp1='Mass', PCProp2='HydroPho', normalize=False)
PCProp1/PCProp2 options:
- 'HydroPho'
- 'HydroPhIl'
- 'Hbond'
- 'SideVol'
- 'Polarity'
- 'Polarizability'
- 'SASA'
- 'NCI'
- 'Mass'
Normalization: If normalize is set to True then the individual physicochemical scalar values for each amino acid are absolute maximum normalized before converting the amino acid sequence to vector form.
-
To align two amino acid sequences using DTW and two physicochemical properties:
PCDTW.PCDTWAlign(inputseq1str, inputseq2str, PCProp1='Mass', PCProp2='HydroPho', Penalty=0, Window=3)
window
= size of Sakoe-Chiba bandpenalty
= somewhat equivalent to mismatch penalty in standard dynamic programming based alignment
Returns a dictionary containing the following values:
- 'Seq1AlignedString'
- 'Seq2AlignedString'
- 'FullAlignment'
- 'Identity'
- 'ConsensusVector'
Example to get the full alignment and identity:
seq1 = "MSDSNQGNNQQNYQQYSQNGNQQQGNNRYQG" seq2 = "MMNNNGNQVSNLSNALRQVNIGNRNSNTTT" print(PCDTWAlign(seq1, seq2)['FullAlignment']) print(PCDTWAlign(seq1, seq2)['Identity'])
-
To get the PCDTW distance between two sequences normalized to the number of amino acids in the alignment:
Dist=PCDTW.PCDTWDist(Seq1, Seq2) print(Dist)
Example to get the distance:
seq1 = "MSDSNQGNNQQNYQQYSQNGNQQQGNNRYQG" seq2 = "MMNNNGNQVSNLSNALRQVNIGNRNSNTTT" print(PCDTWDist(seq1, seq2))
-
To get synthetically evolved homologs for an input sequence:
SynHomologs=PCDTW.PCEvolve(Seq='GALM', PCProp1='Mass', PCProp2='HydroPho', BaseName='ProtX') print(SynHomologs)
PCProp1/PCProp2 options:
- 'HydroPho'
- 'HydroPhIl'
- 'Hbond'
- 'SideVol'
- 'Polarity'
- 'Polarizability'
- 'SASA'
- 'NCI'
- 'Mass'
-
To get a newick format tree using PCDTW that represents the physicochemical similarity of protein sequences:
Newick=PCDTW.PCDTWTree(FastaFile='Your_File_Location.fasta',PCProp1='Mass', PCProp2='HydroPho') print(Newick)
PCProp1/PCProp2 options:
- 'HydroPho'
- 'HydroPhIl'
- 'Hbond'
- 'SideVol'
- 'Polarity'
- 'Polarizability'
- 'SASA'
- 'NCI'
- 'Mass'
This function is derived from the original algorithm used in Dixson and Azad, 2021. Unlike the original algorithm the two physicochemical properties used can be set to any two from the nine included above. If the PCProps are not specified by the user then they default to mass and hydrophobicity. The hydrophobicity values used in this package vary slightly from those used in the original algorithm.
Dependency Citations:
Bio.Phylo:
Talevich, E., Invergo, B.M., Cock, P.J.A., & Chapman, B.A. (2012). Bio.Phylo: A unified toolkit for processing, analyzing, and visualizing phylogenetic trees in Biopython. BMC Bioinformatics, 13, 209
Biopython:
Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163
dtaidistance:
Wannes Meert, Kilian Hendrickx, Toon Van Craenendonck, Pieter Robberechts, Hendrik Blockeel, & Jesse Davis. (2022). DTAIDistance (Version v2). Zenodo. http://doi.org/10.5281/zenodo.5901139
Matplotlib:
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95.
numpy:
Harris, C.R., Millman, K.J., van der Walt, S.J. et al. (2020). Array programming with NumPy. Nature 585, 357–362. DOI: 10.1038/s41586-020-2649-2.
pandas:
McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference (SciPy 2010).
SciPy:
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17(3), 261-272.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pcdtw-0.3.0.tar.gz
.
File metadata
- Download URL: pcdtw-0.3.0.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
d903f537adb7f600c4dde2f68e11b99a98385f75c0ea0b27d2205ccb571c496b
|
|
MD5 |
40a8d77ae676d68e37858e0b2a81eda7
|
|
BLAKE2b-256 |
c5dec6c1bab8cb5a2de8db4851a127cecb9deb99972a3057adaa0c7c73a2db05
|
File details
Details for the file pcdtw-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: pcdtw-0.3.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
5939b4477dc026148f6f70e5521f07c62b95c605f0fd8ccca63867bda8a324d4
|
|
MD5 |
4be4d8a69d642b023699123a9a2485fd
|
|
BLAKE2b-256 |
8cd9163e6e8139cd3bfa63a8aedb4f6fde0ef03fcd82bc24d5bd7ec868d66265
|