Predict kinase-substrate interactions.
Project description
p_KolossuS
Currently under development. Right now all results are junk since data input is randomly generated,
but the basic coding functionality is working. See example data and usage in the example/toy
folder.
Main function you would use is kolossus
. Function works as follows:
Input:
- fasta file of all sequences (or .h5 file of embeddings)
- pair file of format '<kinase_id>\t<substrate_id>\t<substrate_phosphorylation_site>'
Output:
- pairs (kinase_id, substrate_id, substrate_phosphorylation_site, predicted_probability)
Over here, <substrate_phosphorylation_site>
is the offset of the phosphorylated residue.
So for example, if the substrate has sequence 'GGRGSDD', and the serine (5th amino acid)
is the phosphorylated residue, then substrate_phosphorylation_site=5
.
Note that the fasta file should contain all of the sequences (including the full substrate sequences). We'll get the appropriate windows from the pairs file.
Usage:
## on the command line
kinase_file="kinases.fasta"
substrate_file="substrates.fasta"
cat $kinase_file $substrate_file > seqs.fasta
## in python
from kolossus import kolossus
# define inputs to function
seqs_file = 'seqs.fasta'
pairs_file = 'pairs_with_phosphorylation_sites.txt'
# returns a dictionary (kinase, substrate, site): probability
pairs_and_probs = kolossus(pairs_file, fseqs=seqs_file, device='cpu')
# to get kolossus embeddings: use the return_projections parameter
pairs_and_probs, projections = kolossus(pairs_file, fseqs=seqs_file, device='cpu', return_projections=True)
There is also a command-line interface which can be called on the terminal: kolossus-cli
.
usage: kolossus-cli [-h] --pairs PAIRS [--seqs SEQS] [--embeddings EMBEDDINGS] [--dtype DTYPE] [--projections PROJECTIONS]
[--device DEVICE] [--batch_size BATCH_SIZE] -o OUTPUT
optional arguments:
-h, --help show this help message and exit
--pairs PAIRS format: <kinase_id> <substrate_id> <substrate_phosphorylation_site>
--seqs SEQS fasta formatted file of sequences, either seqs or embeddings must be provided
--embeddings EMBEDDINGS
h5 file of sequence embeddings, either seqs or embeddings must be provided
--dtype DTYPE data type of sequence embeddings (usually float32)
--projections PROJECTIONS
name of .h5 files for kolossus projections
--device DEVICE default device on which to run model
--batch_size BATCH_SIZE
Number of pairs at a time on which to run model
-o OUTPUT, --output OUTPUT
desired file path for output
To get esm embeddings, you can use the kolossus-extract
command.
usage: kolossus-extract [-h] -i I [--model MODEL] [--device DEVICE] -o O
optional arguments:
-h, --help show this help message and exit
-i I name of input fasta file
--model MODEL name of model to extract embeddings
--device DEVICE cpu or gpu device to use
-o O name of output .h5 file
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kolossus-0.0.3.tar.gz
.
File metadata
- Download URL: kolossus-0.0.3.tar.gz
- Upload date:
- Size: 26.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b8d831c502abba2c1eeb4273392d6ff29336467ae35b2c6e1d8730b9fcde855 |
|
MD5 | 21e6977c46ef8bbcbb581a76760b0381 |
|
BLAKE2b-256 | e6673d81830898f3143f747558aac4a8f27141e66b346755ed28c45fbef99fe1 |
File details
Details for the file kolossus-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: kolossus-0.0.3-py3-none-any.whl
- Upload date:
- Size: 34.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 694edbc3ba60f6c42d08b7086a8636adc2458a62e11a61c52a8d33e48078e31e |
|
MD5 | 881b516dca716f04bfc5918f1c56c821 |
|
BLAKE2b-256 | 23b8508eb12766566f414b195c79144cf6b55bc3822fd6e9861c90918250bcf3 |