Skip to main content

Predict kinase-substrate interactions.

Project description

p_KolossuS

Currently under development. Right now all results are junk since data input is randomly generated, but the basic coding functionality is working. See example data and usage in the example/toy folder.

Main function you would use is kolossus. Function works as follows:

Input:
  - fasta file of all sequences (or .h5 file of embeddings)
  - pair file of format '<kinase_id>\t<substrate_id>\t<substrate_phosphorylation_site>'

Output:
  - pairs (kinase_id, substrate_id, substrate_phosphorylation_site, predicted_probability)

Over here, <substrate_phosphorylation_site> is the offset of the phosphorylated residue. So for example, if the substrate has sequence 'GGRGSDD', and the serine (5th amino acid) is the phosphorylated residue, then substrate_phosphorylation_site=5.

Note that the fasta file should contain all of the sequences (including the full substrate sequences). We'll get the appropriate windows from the pairs file.

Usage:

## on the command line
kinase_file="kinases.fasta"
substrate_file="substrates.fasta"

cat $kinase_file $substrate_file > seqs.fasta

## in python
from kolossus import kolossus

# define inputs to function
seqs_file = 'seqs.fasta' 
pairs_file = 'pairs_with_phosphorylation_sites.txt'

# returns a dictionary (kinase, substrate, site): probability
pairs_and_probs = kolossus(pairs_file, fseqs=seqs_file, device='cpu')

# to get kolossus embeddings: use the return_projections parameter
pairs_and_probs, projections = kolossus(pairs_file, fseqs=seqs_file, device='cpu', return_projections=True)

There is also a command-line interface which can be called on the terminal: kolossus-cli.

usage: kolossus-cli [-h] --pairs PAIRS [--seqs SEQS] [--embeddings EMBEDDINGS] [--dtype DTYPE] [--projections PROJECTIONS]
                    [--device DEVICE] [--batch_size BATCH_SIZE] -o OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --pairs PAIRS         format: <kinase_id> <substrate_id> <substrate_phosphorylation_site>
  --seqs SEQS           fasta formatted file of sequences, either seqs or embeddings must be provided
  --embeddings EMBEDDINGS
                        h5 file of sequence embeddings, either seqs or embeddings must be provided
  --dtype DTYPE         data type of sequence embeddings (usually float32)
  --projections PROJECTIONS
                        name of .h5 files for kolossus projections
  --device DEVICE       default device on which to run model
  --batch_size BATCH_SIZE
                        Number of pairs at a time on which to run model
  -o OUTPUT, --output OUTPUT
                        desired file path for output

To get esm embeddings, you can use the kolossus-extract command.

usage: kolossus-extract [-h] -i I [--model MODEL] [--device DEVICE] -o O

optional arguments:
  -h, --help       show this help message and exit
  -i I             name of input fasta file
  --model MODEL    name of model to extract embeddings
  --device DEVICE  cpu or gpu device to use
  -o O             name of output .h5 file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kolossus-0.0.3.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

kolossus-0.0.3-py3-none-any.whl (34.1 kB view details)

Uploaded Python 3

File details

Details for the file kolossus-0.0.3.tar.gz.

File metadata

  • Download URL: kolossus-0.0.3.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for kolossus-0.0.3.tar.gz
Algorithm Hash digest
SHA256 8b8d831c502abba2c1eeb4273392d6ff29336467ae35b2c6e1d8730b9fcde855
MD5 21e6977c46ef8bbcbb581a76760b0381
BLAKE2b-256 e6673d81830898f3143f747558aac4a8f27141e66b346755ed28c45fbef99fe1

See more details on using hashes here.

File details

Details for the file kolossus-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: kolossus-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 34.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for kolossus-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 694edbc3ba60f6c42d08b7086a8636adc2458a62e11a61c52a8d33e48078e31e
MD5 881b516dca716f04bfc5918f1c56c821
BLAKE2b-256 23b8508eb12766566f414b195c79144cf6b55bc3822fd6e9861c90918250bcf3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page