Phynteny: Synteny-based prediction of bacteriophage genes
Project description
Phynteny: Synteny-based annotation of bacteriophage genes
Approximately 65% of all bacteriophage (phage) genes cannot be attributed a known biological function. Phynteny uses a long-short term memory model trained on phage synteny (the conserved gene order across phages) to assign hypothetical phage proteins to a PHROG category.
Phynteny is still a work in progress and the LSTM model has not yet been optimised. Use with caution!
NOTE This version of Phynteny will only annotate phages with 120 genes or less due to the architecture of the LSTM. We aim to adjust this in future versions.
Dependencies
Phynteny installation requires Python 3.8 or above. You will need the following python dependencies to run Phynteny and its related support scripts. The latest tested versions of the dependencies are:
- python - version 3.10.0
- sklearn - version 1.2.2
- biopython - version 1.81
- numpy - version 1.21.0 (Windows, Linux, Apple Intel), version 1.24.0 (Apple M1/M2)
- tensorflow - version 2.9.0 (Windows, Linux, Apple Intel), tensorflow-macos version 2.11 (Apple M1/M2)
- pandas - version 2.0.2
- loguru - version 0.7.0
- click - version 8.1.3
We recommend GPU support if you are training Phynteny. This requires CUDA and cuDNN:
- CUDA toolkit - version 11.2
- cuDNN - version 8.1.1
Installation
Currently Phynteny can be installed from this repository
git clone https://github.com/susiegriggo/Phynteny.git --branch main --depth 1
cd Phynteny
pip install .
Install Models
Once you've installed Phynteny you'll need to download the pre-trained models
install_models.py
If you would like to specify a particular location to download the models run
install_models.py -o <path/to/database_dir>
If for some reason this does not work. you can download the pre-trained models from Zenodo and untar in a location of your choice.
Usage
Phynteny takes a genbank file containing PHROG annotations as input. If you phage is not yet in this format, pharokka can take your phage (in fasta format) to a genbank file with PHROG annotations. Phynteny will then return a genbank files and a table containing the details of the predictions made using phynteny. Each prediction is accompanied by a 'phynteny score' which ranges between 1-10 and a recalibrated confidence score.
Reccomended
phynteny test_data/test_phage.gbk -o test_phynteny
Custom
If you wish to specify your own LSTM model, run:
phynteny test_phage.gbk -o test_phage_phynteny -m your_models -t confidence_dict.pkl
Details of how to train the phynteny models and generate confidence estimates is detailed below.
Train Phynteny
Phynteny has already been trained for you on a dataset containing over 1 million prophages! If you feel inclined to generate your own Phynteny model using your own dataset, instructions and training scripts are provided here.
Performance
Coming soon: Notebooks demonstrating the performance of the model
Bugs and Suggestions
If you break Phynteny or would like to make any suggestions please open an issue or email me at susie.grigson@flinders.edu.au
Wow! How can I cite this incredible piece of work?
The Phynteny manuscript is currently in preparation. In the meantime, please cite Phynteny as:
Grigson, S. R., Mallawaarachchi, V., Roach, M. R., Papudeshi, B., Bouras, G., Decewicz, P., Dinsdale, E. A. & Edwards, R. A. (2023). Phynteny: Synteny-based annotation of phage genomes. DOI: 10.5281/zenodo.8128917
If you use pharokka to annotate your phage before using Phynteny please cite it as well:
Bouras, G., Nepal, R., Houtak, G., Psaltis, A. J., Wormald, P. J., & Vreugde, S. (2023). Pharokka: a fast scalable bacteriophage annotation tool. Bioinformatics, 39(1), btac776.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file Phynteny-0-py3-none-any.whl
.
File metadata
- Download URL: Phynteny-0-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a3314a981b0897311b17a3af7d7ccab3a93f539d46523cab56bb6f9b9967e99 |
|
MD5 | 059f232312184cd2ba3f29a0a120289f |
|
BLAKE2b-256 | ff7c5dfd59339a106ca93d1249e5ee4e9989a1d449dce1efe0cef944ae6c66b8 |