Skip to main content

motif conservation in IDRs through pairwise k-mer alignment

Project description

pairk

GitHub Actions Build Status

motif conservation in IDRs through pairwise k-mer alignment

This work was supported by the National Institutes of Health under Award Number R35GM149227. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

repository link

Features

Quantify the relative conservation of a small sequence motif in intrinsically disordered regions (IDRs) of proteins, without the need for a multiple sequence alignment (MSA).

The pairk method: pairk method

Example - PairK vs MSA conservation:

See the demo/tutorial jupyter notebook here: demo/pairk_tutorial.ipynb

Installation

pip install pairk

or for an editable install that you can modify:

git clone https://github.com/jacksonh1/pairk.git
cd pairk
pip install -e .

virtual environment installation:

We suggest using a virtual environment to install pairk, such as conda or venv. You can create a new environment and just install pairk as above, or you can use the provided environment.yml file to create a new environment with the necessary dependencies like so:

git clone https://github.com/jacksonh1/pairk.git
cd pairk
conda env create -f=environment.yaml

Then activate the environment with:

conda activate pairk

and install pairk with either:

pip install .

or for an editable install that you can modify:

pip install -e .

Documentation

see the pairk documentation.

Also see our jupyter notebook tutorial in the demo folder.

Copyright

Copyright (c) 2024, Jackson Halpin

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.1.

references

  • ESM2 (the model used to generate the embeddings):
    • Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, A. Dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, A. Rives, Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
  • Some of the ESM model sequence encoding functions are adapted from the kibby tool (link):
    • W. Yeung, Z. Zhou, S. Li, N. Kannan, Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings. Brief Bioinform 24 (2023)
  • Pairk's built-in conservation scoring functions are adapted from code released with this study:
    • J. A. Capra, M. Singh, Predicting functionally important residues from sequence conservation. Bioinformatics 23, 1875–1882 (2007)
  • Pairk's built-in scoring matrix "EDSSMat50" is from this study:
    • R. Trivedi, H. A. Nagarajaram, Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Sci Rep 9, 16380 (2019)
  • Pairk's built-in "grantham" matrices (including "grantham", "grantham_similarity_norm", and "grantham_similarity_normx100_aligner_compatible") are from or derived from the distance matrix in this study:
    • R. Grantham, Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974).
  • blosum62 matrix is from biopython:
    • P. J. A. Cock, T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, M. J. L. de Hoon, Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
    • S. Henikoff, J. G. Henikoff, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89, 10915–10919 (1992).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pairk-1.0.3.tar.gz (48.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pairk-1.0.3-py3-none-any.whl (50.8 kB view details)

Uploaded Python 3

File details

Details for the file pairk-1.0.3.tar.gz.

File metadata

  • Download URL: pairk-1.0.3.tar.gz
  • Upload date:
  • Size: 48.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pairk-1.0.3.tar.gz
Algorithm Hash digest
SHA256 331d3b2e2e3f5c8d80bb068295a888218aed02ff9f065e11720b6273a7037280
MD5 27a8f292ac119c15646f175dcddad037
BLAKE2b-256 780bbe1587ec5cbe32502d077caa75863fc84f488ac63ffdfe136ef7efb5fc42

See more details on using hashes here.

File details

Details for the file pairk-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: pairk-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 50.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pairk-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5917481d2ba30d729ad9d81cd5374cfc8c7b776a67b111c4ee377f60e434e506
MD5 82f54419b91875cf0ab5299df50392c9
BLAKE2b-256 ea5f1853ee7dd1db574e8f997e836338ffa31504c1f3eb613ab77f8be1952b09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page