Skip to main content

Predict kinase-substrate interactions.

Project description

KolossuS: Kinase Signaling Prediction Tool

Deconvolving the substrates of hundreds of kinases linked to phosphorylation networks driving cellular behavior is a fundamental, unresolved biological challenge, largely due to the poorly understood interplay of kinase selectivity and substrate proximity. We introduce KolossuS, a deep learning framework leveraging protein language models to decode kinase-substrate specificity. KolossuS achieves superior prediction accuracy and sensitivity across mammalian kinomes, enabling proteome-wide predictions and evolutionary insights. By integrating KolossuS with CRISPR-based proximity proteomics in vivo, we capture kinase-substrate recognition and spatial context, obviating prior limitations. We show this combined framework identifies kinase substrates associated with physiological states such as sleep, revealing both known and novel Sik3 substrates during sleep deprivation. This novel integrated computational-experimental approach promises to transform systematic investigations of kinase signaling in health and disease.

Preprint

Jha K., Shonai D., Parekh A., Uezu A., Fujiyama T., Yamamoto H., Parameswaran P., Yanagisawa M., Singh R., Soderling S. (2025). Deep Learning-coupled Proximity Proteomics to Deconvolve Kinase Signaling In Vivo. bioRxiv, 2025-04. bioRxiv preprint

There is also a command-line interface which can be called on the terminal: kolossus-cli.

usage: kolossus-cli [-h] --pairs PAIRS [--seqs SEQS] [--embeddings EMBEDDINGS] [--dtype DTYPE] [--projections PROJECTIONS]
                    [--device DEVICE] [--batch_size BATCH_SIZE] -o OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --pairs PAIRS         format: <kinase_id> <substrate_id> <substrate_phosphorylation_site>
  --seqs SEQS           fasta formatted file of sequences, either seqs or embeddings must be provided
  --embeddings EMBEDDINGS
                        h5 file of sequence embeddings, either seqs or embeddings must be provided
  --dtype DTYPE         data type of sequence embeddings (usually float32)
  --projections PROJECTIONS
                        name of .h5 files for kolossus projections
  --device DEVICE       default device on which to run model
  --batch_size BATCH_SIZE
                        Number of pairs at a time on which to run model
  -o OUTPUT, --output OUTPUT
                        desired file path for output

Note that the fasta file should contain all of the sequences (including the full substrate sequences). We'll get the appropriate windows from the pairs file.

To get the ESM-2 embeddings for your protein sequences, you can use the kolossus-extract command.

usage: kolossus-extract [-h] -i I [--model MODEL] [--device DEVICE] -o O

optional arguments:
  -h, --help       show this help message and exit
  -i I             name of input fasta file
  --model MODEL    name of the ESM-2 model for which you want embeddings: esm2_t48_15B_UR50D or esm2_t33_650M_UR50D
  --device DEVICE  cpu or gpu device to use
  -o O             name of output .h5 file

The pairs file should be formatted like so:

kinase_1  subsrate_1  substrate_1_phosphorylation_site_1
kinase_1  substrate_1  substrate_1_phosphorylation_site_2
kinase_2  subsrate_2  substrate_2_phosphorylation_site_1
...

Here, <substrate_x_phosphorylation_site_y> is the offset of the phosphorylated residue. So for example, if the substrate has sequence 'GGRGSDD', and the serine (5th amino acid) is the phosphorylated residue, then substrate_phosphorylation_site=5.

There is also a python interface for using KolossuS within kolossus scripts. Main function for usage is kolossus. Function works as follows:

Input:
  - fasta file of all sequences (or .h5 file of embeddings)
  - pair file of format '<kinase_id>\t<substrate_id>\t<substrate_phosphorylation_site>'

Output:
  - pairs (kinase_id, substrate_id, substrate_phosphorylation_site, predicted_probability)

Usage:

## on the command line
kinase_file="kinases.fasta"
substrate_file="substrates.fasta"

cat $kinase_file $substrate_file > seqs.fasta

## in python
from kolossus import kolossus

# define inputs to function
seqs_file = 'seqs.fasta' 
pairs_file = 'pairs_with_phosphorylation_sites.txt'

# returns a dictionary (kinase, substrate, site): probability
pairs_and_probs = kolossus(pairs_file, fseqs=seqs_file, device='cpu')

# to get kolossus embeddings: use the return_projections parameter
pairs_and_probs, projections = kolossus(pairs_file, fseqs=seqs_file, device='cpu', return_projections=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kolossus-1.0.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kolossus-1.0.0-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file kolossus-1.0.0.tar.gz.

File metadata

  • Download URL: kolossus-1.0.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.0

File hashes

Hashes for kolossus-1.0.0.tar.gz
Algorithm Hash digest
SHA256 49ae44340d192298d993c35bde4116bcbc797ead3d4e798801a22c3e910cf85d
MD5 a381a950c126b9f27c6f8b70097dfe60
BLAKE2b-256 3a21db25fdb6390cc2ff1d47c510a4f24f7e5169d2001e08eb7f61e90f948339

See more details on using hashes here.

File details

Details for the file kolossus-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: kolossus-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.0

File hashes

Hashes for kolossus-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 185bc2d4758a5ca2e89a5a2220fba679fa8ed6c6a398319af288bfce64498ad1
MD5 6137d4e93c3ec84a67e54b6884e8580a
BLAKE2b-256 2d1e72c22956f5727c440de972d5ceccaf25e1e65ef01c5816be3988b60591a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page