Skip to main content

Predict kinase-substrate interactions.

Project description

p_KolossuS

Currently under development. Right now all results are junk since data input is randomly generated, but the basic coding functionality is working. See example data and usage in the example/toy folder.

Main function you would use is kolossus. Function works as follows:

Input:
  - fasta file of all sequences (or .h5 file of embeddings)
  - pair file of format '<kinase_id>\t<substrate_id>\t<substrate_phosphorylation_site>'

Output:
  - pairs (kinase_id, substrate_id, substrate_phosphorylation_site, predicted_probability)

Over here, <substrate_phosphorylation_site> is the offset of the phosphorylated residue. So for example, if the substrate has sequence 'GGRGSDD', and the serine (5th amino acid) is the phosphorylated residue, then substrate_phosphorylation_site=5.

Note that the fasta file should contain all of the sequences (including the full substrate sequences). We'll get the appropriate windows from the pairs file.

Usage:

## on the command line
kinase_file="kinases.fasta"
substrate_file="substrates.fasta"

cat $kinase_file $substrate_file > seqs.fasta

## in python
from kolossus import kolossus

# define inputs to function
seqs_file = 'seqs.fasta' 
pairs_file = 'pairs_with_phosphorylation_sites.txt'

# returns a dictionary (kinase, substrate, site): probability
pairs_and_probs = kolossus(pairs_file, fseqs=seqs_file, device='cpu')

# to get kolossus embeddings: use the return_projections parameter
pairs_and_probs, projections = kolossus(pairs_file, fseqs=seqs_file, device='cpu', return_projections=True)

There is also a command-line interface which can be called on the terminal: kolossus-cli.

usage: kolossus-cli [-h] --pairs PAIRS [--seqs SEQS] [--embeddings EMBEDDINGS] [--dtype DTYPE] [--projections PROJECTIONS]
                    [--device DEVICE] [--batch_size BATCH_SIZE] -o OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --pairs PAIRS         format: <kinase_id> <substrate_id> <substrate_phosphorylation_site>
  --seqs SEQS           fasta formatted file of sequences, either seqs or embeddings must be provided
  --embeddings EMBEDDINGS
                        h5 file of sequence embeddings, either seqs or embeddings must be provided
  --dtype DTYPE         data type of sequence embeddings (usually float32)
  --projections PROJECTIONS
                        name of .h5 files for kolossus projections
  --device DEVICE       default device on which to run model
  --batch_size BATCH_SIZE
                        Number of pairs at a time on which to run model
  -o OUTPUT, --output OUTPUT
                        desired file path for output

To get esm embeddings, you can use the kolossus-extract command.

usage: kolossus-extract [-h] -i I [--model MODEL] [--device DEVICE] -o O

optional arguments:
  -h, --help       show this help message and exit
  -i I             name of input fasta file
  --model MODEL    name of model to extract embeddings
  --device DEVICE  cpu or gpu device to use
  -o O             name of output .h5 file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kolossus-0.0.1.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

kolossus-0.0.1-py3-none-any.whl (34.1 kB view details)

Uploaded Python 3

File details

Details for the file kolossus-0.0.1.tar.gz.

File metadata

  • Download URL: kolossus-0.0.1.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for kolossus-0.0.1.tar.gz
Algorithm Hash digest
SHA256 48a908262975b61f42ccb7fc61a8348bf85235bb64959a2ea5fba223343016de
MD5 e28d8e0d6c53a26c3acc92c577da3452
BLAKE2b-256 790c8bb6dcbad8e2fa6c59652514e9c9af29fe85797cb26575b8df6583e2fc1a

See more details on using hashes here.

File details

Details for the file kolossus-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: kolossus-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 34.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for kolossus-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a39a742d4857c1cf54e27136b57d9ff778249aa0aded104439cb6b045a778d89
MD5 508e22a7337e8bdab40f47a81cf96b85
BLAKE2b-256 6eafca0f0dc656edeffcb0f21a2a99884e2f9c65d93df4ca05e0fbc83e79f0c2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page