Skip to main content

motif conservation in IDRs through pairwise k-mer alignment

Project description

pairk

GitHub Actions Build Status

motif conservation in IDRs through pairwise k-mer alignment

This work was supported by the National Institutes of Health under Award Number R35GM149227. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

repository link

Features

Quantify the relative conservation of a small sequence motif in intrinsically disordered regions (IDRs) of proteins, without the need for a multiple sequence alignment (MSA).

The pairk method: pairk method

Example - PairK vs MSA conservation:

See the demo/tutorial jupyter notebook here: demo/pairk_tutorial.ipynb

Installation

pip install pairk

or for an editable install that you can modify:

git clone https://github.com/jacksonh1/pairk.git
cd pairk
pip install -e .

virtual environment installation:

We suggest using a virtual environment to install pairk, such as conda or venv. You can create a new environment and just install pairk as above, or you can use the provided environment.yml file to create a new environment with the necessary dependencies like so:

git clone https://github.com/jacksonh1/pairk.git
cd pairk
conda env create -f=environment.yaml

Then activate the environment with:

conda activate pairk

and install pairk with either:

pip install .

or for an editable install that you can modify:

pip install -e .

Documentation

see the pairk documentation.

Also see our jupyter notebook tutorial in the demo folder.

Copyright

Copyright (c) 2024, Jackson Halpin

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.1.

references

  • ESM2 (the model used to generate the embeddings):
    • Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, A. Dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, A. Rives, Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
  • Some of the ESM model sequence encoding functions are adapted from the kibby tool (link):
    • W. Yeung, Z. Zhou, S. Li, N. Kannan, Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings. Brief Bioinform 24 (2023)
  • Pairk's built-in conservation scoring functions are adapted from code released with this study:
    • J. A. Capra, M. Singh, Predicting functionally important residues from sequence conservation. Bioinformatics 23, 1875–1882 (2007)
  • Pairk's built-in scoring matrix "EDSSMat50" is from this study:
    • R. Trivedi, H. A. Nagarajaram, Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Sci Rep 9, 16380 (2019)
  • Pairk's built-in "grantham" matrices (including "grantham", "grantham_similarity_norm", and "grantham_similarity_normx100_aligner_compatible") are from or derived from the distance matrix in this study:
    • R. Grantham, Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974).
  • blosum62 matrix is from biopython:
    • P. J. A. Cock, T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, M. J. L. de Hoon, Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
    • S. Henikoff, J. G. Henikoff, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89, 10915–10919 (1992).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pairk-1.0.4.tar.gz (48.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pairk-1.0.4-py3-none-any.whl (51.0 kB view details)

Uploaded Python 3

File details

Details for the file pairk-1.0.4.tar.gz.

File metadata

  • Download URL: pairk-1.0.4.tar.gz
  • Upload date:
  • Size: 48.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pairk-1.0.4.tar.gz
Algorithm Hash digest
SHA256 60b795d0ae313660deb7ae09348320270e633b9306784baddc36e0c4200c5c0a
MD5 59cf6b1bdbaab8a9f36969f0174d4657
BLAKE2b-256 65305b37ac3ca106b69ba26028a57ea1dfe97faa898b250f28595217249abac5

See more details on using hashes here.

File details

Details for the file pairk-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: pairk-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 51.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for pairk-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4651569bd2ef1836b8af53bbccdaf3f5e46d2c8217b2bd5d4fdf384c2e8bf031
MD5 8fae322848b7149ea1ce95feabd36afd
BLAKE2b-256 1627c0b1437d187232bcfc184bc12cfa6095cc908790ac1c3f7bac39fea26c84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page