Skip to main content

This package has functions for the conversion of amino acid sequences to physicochemical vectors and the subsequent analysis of those vector sequences.

Project description

PCDTW is a package that implements the conversion of amino acid sequences to physicochemical vectors and subsequently allows for alignment of the sequences based on those vectors, development of consensus vectors that can be used to search databases for similar physicochemical profiles, development of the DTW distance between two physicochemical vectors and a few other functions. The basis for this package can be found in three publications and should be consulted for further background [1–3].

To install PCDTW (Two Options): -Use ‘pip install PCDTW’ in a powershell prompt -Use ‘! pip install PCDTW’ in a jupyter notebook

To use PCDTW: Use ‘import PCDTW’

Citations 1)Dixson, J.D.; Vumma, L.; Azad, R.K. An Analysis of Combined Molecular Weight and Hydrophobicity Similarity between the Amino Acid Sequences of Spike Protein Receptor Binding Domains of Betacoronaviruses and Functionally Similar Sequences from Other Virus Families. Microorganisms 2024, 12.

2)Dixson, J.D.; Azad, R.K. Physicochemical Evaluation of Remote Homology in the Twilight Zone. Proteins Struct. Funct. Bioinforma. 2024, n/a, doi:https://doi.org/10.1002/prot.26742.

3)Dixson, J.D.; Azad, R.K. A Novel Predictor of ACE2-Binding Ability among Betacoronaviruses. Evol. Med. Public Heal. 2021, 9, 360–373, doi:10.1093/EMPH/EOAB032.

Usage:

  1. To convert an amino acid sequence to vector form using two physicochemical properties:

    PCDTWConvert(x, PCProp1='Mass', PCProp2='HydroPho', normalize=False)
    

    PCProp1/PCProp2 options:

    • 'HydroPho'
    • 'HydroPhIl'
    • 'Hbond'
    • 'SideVol'
    • 'Polarity'
    • 'Polarizability'
    • 'SASA'
    • 'NCI'
    • 'Mass'

    Normalization: If normalize is set to True then the individual physicochemical scalar values for each amino acid are absolute maximum normalized before converting the amino acid sequence to vector form.

  2. To align two amino acid sequences using DTW and two physicochemical properties:

    PCDTWAlign(inputseq1str, inputseq2str, PCProp1='Mass', PCProp2='HydroPho', Penalty=0, Window=3)
    
    • window = size of Sakoe-Chiba band
    • penalty = somewhat equivalent to mismatch penalty in standard dynamic programming based alignment

    Returns a dictionary containing the following values:

    • 'Seq1AlignedString'
    • 'Seq2AlignedString'
    • 'FullAlignment'
    • 'Identity'
    • 'ConsensusVector'

    Example to get the full alignment and identity:

    seq1 = "MSDSNQGNNQQNYQQYSQNGNQQQGNNRYQG"
    seq2 = "MMNNNGNQVSNLSNALRQVNIGNRNSNTTT"
    print(PCDTWAlign(seq1, seq2)['FullAlignment'])
    print(PCDTWAlign(seq1, seq2)['Identity'])
    
  3. To get the PCDTW distance between two sequences normalized to the number of amino acids in the alignment:

    Dist=PCDTWDist(Seq1, Seq2)
    print(Dist)
    

    Example to get the distance:

    seq1 = "MSDSNQGNNQQNYQQYSQNGNQQQGNNRYQG"
    seq2 = "MMNNNGNQVSNLSNALRQVNIGNRNSNTTT"
    print(PCDTWDist(seq1, seq2))
    

Dependency Citations:

dtaidistance: Wannes Meert, Kilian Hendrickx, Toon Van Craenendonck, Pieter Robberechts, Hendrik Blockeel, & Jesse Davis. (2022). DTAIDistance (Version v2). Zenodo. http://doi.org/10.5281/zenodo.5901139

numpy: Harris, C.R., Millman, K.J., van der Walt, S.J. et al. (2020). Array programming with NumPy. Nature 585, 357–362. DOI: 10.1038/s41586-020-2649-2.

pandas: McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference (SciPy 2010).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pcdtw-0.2.0.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

pcdtw-0.2.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file pcdtw-0.2.0.tar.gz.

File metadata

  • Download URL: pcdtw-0.2.0.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for pcdtw-0.2.0.tar.gz
Algorithm Hash digest
SHA256 dd4f8480936f2a6cb4e297282d3605957c166c3b2b2d11e12b1b4a1f7052428b
MD5 e6d3f07ee7d61518e75793111a8a19d9
BLAKE2b-256 3a1fa023981ce5a8d2d15f14bb4c847920a8a9cfe62375f1723a363fa67571a8

See more details on using hashes here.

File details

Details for the file pcdtw-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pcdtw-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for pcdtw-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4082229c93d4d00e93266e10eb8e5f9566831e7d7da6934172f6eca3e0f39e4d
MD5 fc699961d16cd16f4ecb706ebd1518bb
BLAKE2b-256 906d0378607ba5c7006f42af19251d273f8f8c313dd6d2b06dfe66ea7a597523

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page