Skip to main content

PRotein Ortholog Search Tool

Project description

PROST python package v0.2.16

PRotein Ortholog Search Tool is a new homolog detection tool that utilizes ESM-1b language model and iDCT quantization method. PROST is fast and accurate compared to traditional tools.

Installation

The package can be installed with:

pip install pyprost

On the initial run, PROST will download required files to ~/.config/prost or an user defined directory via PROSTDIR environment variable.

How to use

Following commands can be used to create databases and perform homology search.

prost.py makedb db/sp.fa db/sp.prdb
prost.py makedb db/covid.fa db/covid.prdb
prost.py search --thr 0.05 --jobs 8 db/covid.prdb db/sp.prdb results
prost.py searchsp --thr 0.05 --jobs 8 db/covid.prdb results
prost.py tosjonwp -a -i 'info' results.tsv website
  • makedb: creates a PROST database from given fasta file. The fasta file usually contains more than one entry. If you are processing large number of proteins use splitting option to save the quantizations into smaller databases (--split 1000). Later this chunks can be merged to a one database with mergedbs command (pyprost.py mergedbs input* out.prdb).
  • search: searches a query database agains a target database. Query database can contain one or more sequences embedded using makedb command. --thr can be used to specify an e-value threshold. The default threshold is 0.05. You can paralelize the search by using --jobs option.
  • searchsp: searches a query database agains a SwissProt February 2023 database. Performs GO enrichment analysis on found homologs. Query database can contain one or more sequences embedded using makedb command. Again --thr can be used to specify an e-value threshold. --gothr can be used to specifiy different e-value threshold for GO enrichment analysis. searchsp produces a tab seprataed file .tsv This file can be converted into a .json file that can be used with the tool JSONWP using the command prost.py tojsonwp -i 'Here is an info string to shown on website' results.tsv website Here is an example result: jsonwp1 jsonwp2 Sequence alignment done automatically using the PROTSUB matrix which is better for aligning remote proteins then BLOSUM62 matrix. Protein structures are fetched from the Alphafold 2 database. No alignment produced this time. jsonwp3

Scripting

import pyprost

hpo30 = '''MPLIMYKFLLVTSIFLIVSGLILTAFSLFSPLWEVVDFPRSHLSHHHGLWWDCIVHHET
LIPLHEDQAELRGDRCDSKMDSSVQASLRVALEKGDEEARELLLHRFLPHHKGVIFFAVF
TFVFGLISILIGSCSPCFPPNALLYVVGVFMTGACSLLADIIYIFAFNQKPIFTKEQSEP
HQEVLSRRERGSIGPIYKRLGIATYMHMFGSMLLIAAFIFSIFCAYFLITSKHAHDVCCT
SRKEYREQTKWKNNGLILKTGRVNHQSHRPFVVIDDDSSM'''

clc2 = '''MSQAVSYAILVLTIIAFLLTAAALCTPAWQVVYAREIRQWVQSGLWLSCQTRPNGMYSCT
YTFSHDDFNTYFSDEVSGFRTPSFYPWQRTLFHIYLISQAFAMLSLISFCVSVSHKESKM
PNILRSVFLVLAAVIAFGCLIAFAVYSYMVEYRFFHVSVSGIYEKHRGYSWYIALTGAFV
YLVAIILSVVHVLLQARNSNTTMSRQNINSSLQSDFFEYQYHPNRSMESFEDRFAMRTLP
PVPRQEKKTTVF'''

hpo30embedding = pyprost.quantSeq(hpo30)
clc2embedding = pyprost.quantSeq(clc2)

dist = pyprost.prostDistance(hpo30embedding,clc2embedding)
print('HPO30-CLC2 prost distance:',dist)
#Should print: HPO30-CLC2 prost distance: 3479.0
#Distance smaller than 6875 may indicate homology

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprost-0.2.16.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyprost-0.2.16-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file pyprost-0.2.16.tar.gz.

File metadata

  • Download URL: pyprost-0.2.16.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for pyprost-0.2.16.tar.gz
Algorithm Hash digest
SHA256 2d81a8232b237d8345b53e6e2e420653c1c8780efcad60a40f72b6c85f3ba07c
MD5 5b3aac13df033d47de337830aa3403ef
BLAKE2b-256 2b80bd18f0222329f32a78d832c3da7f15ad35b3a81e335ee274b8e5b9fece4a

See more details on using hashes here.

File details

Details for the file pyprost-0.2.16-py3-none-any.whl.

File metadata

  • Download URL: pyprost-0.2.16-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for pyprost-0.2.16-py3-none-any.whl
Algorithm Hash digest
SHA256 f7b0cab52577c029e2298cf8b0f473b3ae71400c1598c30738544cd3810cf7fa
MD5 b12f28d537c3d0ffed8d3a7349e0ca0f
BLAKE2b-256 5df7dd9d8d4eda20c6ce1da2d6bc9219a60b2bf8ad865b48a892a8190cd0954a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page