Skip to main content

motif conservation in IDRs through pairwise k-mer alignment

Project description

pairk

GitHub Actions Build Status

motif conservation in IDRs through pairwise k-mer alignment

This work was supported by the National Institutes of Health under Award Number R35GM149227. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

repository link
pairk documentation

Features

Quantify the relative conservation of a small sequence motif in intrinsically disordered regions (IDRs) of proteins, without the need for a multiple sequence alignment (MSA).

The pairk method: pairk method

Example - PairK vs MSA conservation:

See the demo/tutorial jupyter notebook here: demo/pairk_tutorial.ipynb

Installation

pip install pairk

or for an editable install that you can modify:

git clone https://github.com/jacksonh1/pairk.git
cd pairk
pip install -e .

virtual environment installation:

We suggest using a virtual environment to install pairk, such as conda or venv. You can create a new environment and just install pairk as above, or you can use the provided environment.yml file to create a new environment with the necessary dependencies like so:

git clone https://github.com/jacksonh1/pairk.git
cd pairk
conda env create -f=environment.yaml

Then activate the environment with:

conda activate pairk

and install pairk with either:

pip install .

or for an editable install that you can modify:

pip install -e .

Documentation

see the pairk documentation.

Also see our jupyter notebook tutorial in the demo folder.

Copyright

Copyright (c) 2024, Jackson Halpin

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.1.

references

  • ESM2 (the model used to generate the embeddings):
    • Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, A. Dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, A. Rives, Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
  • Some of the ESM model sequence encoding functions are adapted from the kibby tool (link):
    • W. Yeung, Z. Zhou, S. Li, N. Kannan, Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings. Brief Bioinform 24 (2023)
  • Pairk's built-in conservation scoring functions are adapted from code released with this study:
    • J. A. Capra, M. Singh, Predicting functionally important residues from sequence conservation. Bioinformatics 23, 1875–1882 (2007)
  • Pairk's built-in scoring matrix "EDSSMat50" is from this study:
    • R. Trivedi, H. A. Nagarajaram, Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Sci Rep 9, 16380 (2019)
  • Pairk's built-in "grantham" matrices (including "grantham", "grantham_similarity_norm", and "grantham_similarity_normx100_aligner_compatible") are from or derived from the distance matrix in this study:
    • R. Grantham, Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974).
  • blosum62 matrix is from biopython:
    • P. J. A. Cock, T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, M. J. L. de Hoon, Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
    • S. Henikoff, J. G. Henikoff, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89, 10915–10919 (1992).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pairk-1.0.7.tar.gz (48.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pairk-1.0.7-py3-none-any.whl (51.5 kB view details)

Uploaded Python 3

File details

Details for the file pairk-1.0.7.tar.gz.

File metadata

  • Download URL: pairk-1.0.7.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pairk-1.0.7.tar.gz
Algorithm Hash digest
SHA256 1ec0c516e60a01769ebf468dd8f93392b97c1913e6301328f05f7ddd20357a73
MD5 f8bb3a7d13a77d2718dafb60ee42d565
BLAKE2b-256 f7f0805c6269266604776bb89ca93d93274f444b7e4353a01e2b4221df2f80ad

See more details on using hashes here.

File details

Details for the file pairk-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: pairk-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 51.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pairk-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 3b17c3d33f31b458b7f50b0eca8b06c38516162d852afe043a2528fcc0f94a48
MD5 2ae9097c3d30d45ca31ab7f5a9cbdb95
BLAKE2b-256 64674a0bfd324db6bdd2d715d561317fc2667f0466fc49e77dfd7ac5a0f9423a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page