PRotein Ortholog Search Tool
Project description
PROST python package v0.2.16
PRotein Ortholog Search Tool is a new homolog detection tool that utilizes ESM-1b language model and iDCT quantization method. PROST is fast and accurate compared to traditional tools.
Installation
The package can be installed with:
pip install pyprost
On the initial run, PROST will download required files to ~/.config/prost or an user defined directory via PROSTDIR environment variable.
How to use
Following commands can be used to create databases and perform homology search.
prost.py makedb db/sp.fa db/sp.prdb
prost.py makedb db/covid.fa db/covid.prdb
prost.py search --thr 0.05 --jobs 8 db/covid.prdb db/sp.prdb results
prost.py searchsp --thr 0.05 --jobs 8 db/covid.prdb results
prost.py tosjonwp -a -i 'info' results.tsv website
makedb: creates a PROST database from given fasta file. The fasta file usually contains more than one entry. If you are processing large number of proteins use splitting option to save the quantizations into smaller databases (--split 1000). Later this chunks can be merged to a one database withmergedbscommand (pyprost.py mergedbs input* out.prdb).search: searches a query database agains a target database. Query database can contain one or more sequences embedded using makedb command.--thrcan be used to specify an e-value threshold. The default threshold is 0.05. You can paralelize the search by using--jobsoption.searchsp: searches a query database agains a SwissProt February 2023 database. Performs GO enrichment analysis on found homologs. Query database can contain one or more sequences embedded using makedb command. Again--thrcan be used to specify an e-value threshold.--gothrcan be used to specifiy different e-value threshold for GO enrichment analysis.searchspproduces a tab seprataed file.tsvThis file can be converted into a.jsonfile that can be used with the tool JSONWP using the commandprost.py tojsonwp -i 'Here is an info string to shown on website' results.tsv websiteHere is an example result:Sequence alignment done automatically using the PROTSUB matrix which is better for aligning remote proteins then BLOSUM62 matrix. Protein structures are fetched from the Alphafold 2 database. No alignment produced this time.
Scripting
import pyprost
hpo30 = '''MPLIMYKFLLVTSIFLIVSGLILTAFSLFSPLWEVVDFPRSHLSHHHGLWWDCIVHHET
LIPLHEDQAELRGDRCDSKMDSSVQASLRVALEKGDEEARELLLHRFLPHHKGVIFFAVF
TFVFGLISILIGSCSPCFPPNALLYVVGVFMTGACSLLADIIYIFAFNQKPIFTKEQSEP
HQEVLSRRERGSIGPIYKRLGIATYMHMFGSMLLIAAFIFSIFCAYFLITSKHAHDVCCT
SRKEYREQTKWKNNGLILKTGRVNHQSHRPFVVIDDDSSM'''
clc2 = '''MSQAVSYAILVLTIIAFLLTAAALCTPAWQVVYAREIRQWVQSGLWLSCQTRPNGMYSCT
YTFSHDDFNTYFSDEVSGFRTPSFYPWQRTLFHIYLISQAFAMLSLISFCVSVSHKESKM
PNILRSVFLVLAAVIAFGCLIAFAVYSYMVEYRFFHVSVSGIYEKHRGYSWYIALTGAFV
YLVAIILSVVHVLLQARNSNTTMSRQNINSSLQSDFFEYQYHPNRSMESFEDRFAMRTLP
PVPRQEKKTTVF'''
hpo30embedding = pyprost.quantSeq(hpo30)
clc2embedding = pyprost.quantSeq(clc2)
dist = pyprost.prostDistance(hpo30embedding,clc2embedding)
print('HPO30-CLC2 prost distance:',dist)
#Should print: HPO30-CLC2 prost distance: 3479.0
#Distance smaller than 6875 may indicate homology
Resources
- Yeast analysis: https://mesihk.github.io/prostyeast
- Unannotated Human Proteins Analysis: https://mesihk.github.io/prosthuman
- Webserver: https://mesihk.github.io/prost
- PROST Python package: https://github.com/MesihK/prost
- PROST Research Data: https://github.com/MesihK/prost-data
- JSONWP: https://jsonwp.onrender.com/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyprost-0.2.16.tar.gz.
File metadata
- Download URL: pyprost-0.2.16.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d81a8232b237d8345b53e6e2e420653c1c8780efcad60a40f72b6c85f3ba07c
|
|
| MD5 |
5b3aac13df033d47de337830aa3403ef
|
|
| BLAKE2b-256 |
2b80bd18f0222329f32a78d832c3da7f15ad35b3a81e335ee274b8e5b9fece4a
|
File details
Details for the file pyprost-0.2.16-py3-none-any.whl.
File metadata
- Download URL: pyprost-0.2.16-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7b0cab52577c029e2298cf8b0f473b3ae71400c1598c30738544cd3810cf7fa
|
|
| MD5 |
b12f28d537c3d0ffed8d3a7349e0ca0f
|
|
| BLAKE2b-256 |
5df7dd9d8d4eda20c6ce1da2d6bc9219a60b2bf8ad865b48a892a8190cd0954a
|