Skip to main content

Predict kinase-substrate interactions. TEST VERSION!!! JUNK OUTPUT!!!

Project description

p_KolossuS_test

JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK JUNK

Currently under development. Right now all results are junk since data input is randomly generated, but the basic coding functionality is working. See example data and usage in the example/toy folder.

Main function you would use is kolossus. Function works as follows:

Input:
  - fasta file of all sequences (or .h5 file of embeddings)
  - pair file of format '<kinase_id>\t<substrate_id>\t<substrate_phosphorylation_site>'

Output:
  - pairs (kinase_id, substrate_id, substrate_phosphorylation_site, predicted_probability)

Over here, <substrate_phosphorylation_site> is the offset of the phosphorylated residue. So for example, if the substrate has sequence 'GGRGSDD', and the serine (5th amino acid) is the phosphorylated residue, then substrate_phosphorylation_site=5.

Note that the fasta file should contain all of the sequences (including the full substrate sequences). We'll get the appropriate windows from the pairs file.

Usage:

## on the command line
kinase_file="kinases.fasta"
substrate_file="substrates.fasta"

cat $kinase_file $substrate_file > seqs.fasta

## in python
from kolossus import kolossus

# define inputs to function
seqs_file = 'seqs.fasta' 
pairs_file = 'pairs_with_phosphorylation_sites.txt'

# returns a dictionary (kinase, substrate, site): probability
pairs_and_probs = kolossus(pairs_file, fseqs=seqs_file, device='cpu')

# to get kolossus embeddings: use the return_projections parameter
pairs_and_probs, projections = kolossus(pairs_file, fseqs=seqs_file, device='cpu', return_projections=True)

There is also a command-line interface which can be called on the terminal: kolossus-cli.

usage: kolossus-cli [-h] --pairs PAIRS [--seqs SEQS] [--embeddings EMBEDDINGS] [--dtype DTYPE] [--projections PROJECTIONS]
                    [--device DEVICE] [--batch_size BATCH_SIZE] -o OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --pairs PAIRS         format: <kinase_id> <substrate_id> <substrate_phosphorylation_site>
  --seqs SEQS           fasta formatted file of sequences, either seqs or embeddings must be provided
  --embeddings EMBEDDINGS
                        h5 file of sequence embeddings, either seqs or embeddings must be provided
  --dtype DTYPE         data type of sequence embeddings (usually float32)
  --projections PROJECTIONS
                        name of .h5 files for kolossus projections
  --device DEVICE       default device on which to run model
  --batch_size BATCH_SIZE
                        Number of pairs at a time on which to run model
  -o OUTPUT, --output OUTPUT
                        desired file path for output

To get esm embeddings, you can use the kolossus-extract command.

usage: kolossus-extract [-h] -i I [--model MODEL] [--device DEVICE] -o O

optional arguments:
  -h, --help       show this help message and exit
  -i I             name of input fasta file
  --model MODEL    name of model to extract embeddings
  --device DEVICE  cpu or gpu device to use
  -o O             name of output .h5 file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kolossus_test-0.0.5.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

kolossus_test-0.0.5-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file kolossus_test-0.0.5.tar.gz.

File metadata

  • Download URL: kolossus_test-0.0.5.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for kolossus_test-0.0.5.tar.gz
Algorithm Hash digest
SHA256 908b5786d098d71646551448926a46130786042485dfda73acd98ad4b44a0f12
MD5 e032d80e1dafc41e1babf63cbd1a358d
BLAKE2b-256 6a317d724c0edaa7afb5500c4d9778a1c6f8fef54c255dcd0e3a659570788adc

See more details on using hashes here.

File details

Details for the file kolossus_test-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for kolossus_test-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 548952491d7b8e6f88d8375c583f750ba863572c8fd50d01a07c389a844968e6
MD5 d620e8f63ff39cdb90a2e3fe3deb329b
BLAKE2b-256 24711ab78516e4ddbf522f614ff46f32440947fd624130cb929d780351f3e6a8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page