Skip to main content

Predict kinase-substrate interactions.

Project description

p_KolossuS

Currently under development. Right now all results are junk since data input is randomly generated, but the basic coding functionality is working. See example data and usage in the example/toy folder.

Main function you would use is kolossus. Function works as follows:

Input:
  - fasta file of all sequences (or .h5 file of embeddings)
  - pair file of format '<kinase_id>\t<substrate_id>\t<substrate_phosphorylation_site>'

Output:
  - pairs (kinase_id, substrate_id, substrate_phosphorylation_site, predicted_probability)

Over here, <substrate_phosphorylation_site> is the offset of the phosphorylated residue. So for example, if the substrate has sequence 'GGRGSDD', and the serine (5th amino acid) is the phosphorylated residue, then substrate_phosphorylation_site=5.

Note that the fasta file should contain all of the sequences (including the full substrate sequences). We'll get the appropriate windows from the pairs file.

Usage:

## on the command line
kinase_file="kinases.fasta"
substrate_file="substrates.fasta"

cat $kinase_file $substrate_file > seqs.fasta

## in python
from kolossus import kolossus

# define inputs to function
seqs_file = 'seqs.fasta' 
pairs_file = 'pairs_with_phosphorylation_sites.txt'

# returns a dictionary (kinase, substrate, site): probability
pairs_and_probs = kolossus(pairs_file, fseqs=seqs_file, device='cpu')

# to get kolossus embeddings: use the return_projections parameter
pairs_and_probs, projections = kolossus(pairs_file, fseqs=seqs_file, device='cpu', return_projections=True)

There is also a command-line interface which can be called on the terminal: kolossus-cli.

usage: kolossus-cli [-h] --pairs PAIRS [--seqs SEQS] [--embeddings EMBEDDINGS] [--dtype DTYPE] [--projections PROJECTIONS]
                    [--device DEVICE] [--batch_size BATCH_SIZE] -o OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --pairs PAIRS         format: <kinase_id> <substrate_id> <substrate_phosphorylation_site>
  --seqs SEQS           fasta formatted file of sequences, either seqs or embeddings must be provided
  --embeddings EMBEDDINGS
                        h5 file of sequence embeddings, either seqs or embeddings must be provided
  --dtype DTYPE         data type of sequence embeddings (usually float32)
  --projections PROJECTIONS
                        name of .h5 files for kolossus projections
  --device DEVICE       default device on which to run model
  --batch_size BATCH_SIZE
                        Number of pairs at a time on which to run model
  -o OUTPUT, --output OUTPUT
                        desired file path for output

To get esm embeddings, you can use the kolossus-extract command.

usage: kolossus-extract [-h] -i I [--model MODEL] [--device DEVICE] -o O

optional arguments:
  -h, --help       show this help message and exit
  -i I             name of input fasta file
  --model MODEL    name of model to extract embeddings
  --device DEVICE  cpu or gpu device to use
  -o O             name of output .h5 file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kolossus-0.0.2.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

kolossus-0.0.2-py3-none-any.whl (34.1 kB view details)

Uploaded Python 3

File details

Details for the file kolossus-0.0.2.tar.gz.

File metadata

  • Download URL: kolossus-0.0.2.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for kolossus-0.0.2.tar.gz
Algorithm Hash digest
SHA256 73d11ad03a06db43d417e0e55e4a67e97b1ff37e051fe2c17833ac8632f8a7d6
MD5 062f1a54414cb593e3bec03c4e117bf7
BLAKE2b-256 58037298f1ce7a83a0808beae89f3e671a35717b1dc51fdc9f230768098d1598

See more details on using hashes here.

File details

Details for the file kolossus-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: kolossus-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 34.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for kolossus-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a61c670a16e000d5acaeb6f4b606207dae278ee02c9efee92b59c3ebf7c897a0
MD5 eacf0d86794a6032dbc311c0753d5d55
BLAKE2b-256 989694722f5417b39b8046210ba2b38bb68f3a0434ebbf16125b6381e45f6519

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page