motif conservation in IDRs through pairwise k-mer alignment
Project description
pairk
motif conservation in IDRs through pairwise k-mer alignment
This work was supported by the National Institutes of Health under Award Number R35GM149227. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Features
Quantify the relative conservation of a small sequence motif in intrinsically disordered regions (IDRs) of proteins, without the need for a multiple sequence alignment (MSA).
The pairk method:
Example - PairK vs MSA conservation:
See the demo/tutorial jupyter notebook here: demo/pairk_tutorial.ipynb
Installation
pip install pairk
or for an editable install that you can modify:
git clone https://github.com/jacksonh1/pairk.git
cd pairk
pip install -e .
virtual environment installation:
We suggest using a virtual environment to install pairk, such as conda or venv. You can create a new environment and just install pairk as above, or you can use the provided environment.yml file to create a new environment with the necessary dependencies like so:
git clone https://github.com/jacksonh1/pairk.git
cd pairk
conda env create -f=environment.yaml
Then activate the environment with:
conda activate pairk
and install pairk with either:
pip install .
or for an editable install that you can modify:
pip install -e .
Documentation
see the pairk documentation.
Also see our jupyter notebook tutorial in the demo folder.
Copyright
Copyright (c) 2024, Jackson Halpin
Acknowledgements
Project based on the Computational Molecular Science Python Cookiecutter version 1.1.
references
- ESM2 (the model used to generate the embeddings):
- Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, A. Dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, A. Rives, Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
- Some of the ESM model sequence encoding functions are adapted from the kibby tool (link):
- W. Yeung, Z. Zhou, S. Li, N. Kannan, Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings. Brief Bioinform 24 (2023)
- Pairk's built-in conservation scoring functions are adapted from code released with this study:
- J. A. Capra, M. Singh, Predicting functionally important residues from sequence conservation. Bioinformatics 23, 1875–1882 (2007)
- Pairk's built-in scoring matrix "EDSSMat50" is from this study:
- R. Trivedi, H. A. Nagarajaram, Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Sci Rep 9, 16380 (2019)
- Pairk's built-in "grantham" matrices (including "grantham", "grantham_similarity_norm", and "grantham_similarity_normx100_aligner_compatible") are from or derived from the distance matrix in this study:
- R. Grantham, Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974).
- blosum62 matrix is from biopython:
- P. J. A. Cock, T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, M. J. L. de Hoon, Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
- S. Henikoff, J. G. Henikoff, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89, 10915–10919 (1992).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pairk-1.0.4.tar.gz.
File metadata
- Download URL: pairk-1.0.4.tar.gz
- Upload date:
- Size: 48.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60b795d0ae313660deb7ae09348320270e633b9306784baddc36e0c4200c5c0a
|
|
| MD5 |
59cf6b1bdbaab8a9f36969f0174d4657
|
|
| BLAKE2b-256 |
65305b37ac3ca106b69ba26028a57ea1dfe97faa898b250f28595217249abac5
|
File details
Details for the file pairk-1.0.4-py3-none-any.whl.
File metadata
- Download URL: pairk-1.0.4-py3-none-any.whl
- Upload date:
- Size: 51.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4651569bd2ef1836b8af53bbccdaf3f5e46d2c8217b2bd5d4fdf384c2e8bf031
|
|
| MD5 |
8fae322848b7149ea1ce95feabd36afd
|
|
| BLAKE2b-256 |
1627c0b1437d187232bcfc184bc12cfa6095cc908790ac1c3f7bac39fea26c84
|