PhageAI is an AI-driven software platform using advanced Machine Learning and Natural Language Processing techniques for deeper understanding of the bacteriophages genomics.
Project description
PhageAI is an application that simultaneously represents a repository of knowledge of bacteriophages and a tool to analyse genomes with Artificial Intelligence support. This package supports the most critical programmable features from our platform.
Machine Learning algorithms can process enormous amounts of data in relatively short time in order to find connections and dependencies that are unobvious for human beings. Correctly designed applications based on AI are able to vastly improve and speed up the work of the domain experts.
Models based on DNA contextual vectorization and Deep Neural Networks are particularly effective when it comes to analysis of genomic data. The system that we propose aims to use the phages sequences uploaded to the database to build a model which is able to predict if a bacteriophage is chronic, temperate or virulent with a high probability.
One of the key system modules is the bacteriophages repository with a clean web interface that allows to browse, upload and share data with other users. The gathered knowledge about the bacteriophages is not only valuable on its own but also because of the ability to train the ever-improving Machine Learning models.
Detection of virulent or temperate features is only one of the first tasks that can be solved with Artificial Intelligence. The combination of Biology, Natural Language Processing and Machine Learning allows us to create algorithms for genomic data processing that could eventually turn out to be effective in a wide range of problems with focus on classification and information extraced from DNA.
Table of Contents
Framework modules | Documentation | Installation | Benchmark | Community and Contributions | Have a question? | Found a bug? | Team | Change log | License | Cite
Framework modules
Set of methods related with:
lifecycle
- bacteriophage lifecycle prediction:.predict(fasta_path)
- return bacteriophage lifecycle prediction class (Virulent, Temperate or Chronic) with probability (%);
taxonomy
- bacteriophage taxonomy order, family and genus prediction (TBA);topology
- bacteriophage genome topology prediction (TBA);repository
- set of methods related with PhageAI bacteriophage repository:.get_record(value)
- return dict with Bacteriophage meta-data.get_top10_similar_phages(value)
- return list of dicts contained top-10 most similar bacteriophages
Documentation
The official technical documentation is hosted on ReadTheDocs: https://phageai.readthedocs.io
Installation and usage
PhageAI user account (1/3)
Create a free user account in the PhageAI web platform or use an existing one. If you had to create new one, activate your account by activation link which was sent on your mail inbox. After that, log into the platform successfully and click "My profile" on menu (left sidebar). From the "API access" section copy the access token (string) and keep it for the steps below.
PhageAI package (2/3)
PhageAI requires Python 3.8.0+ to run and can be installed by running:
pip install phageai
If you can't wait for the latest hotness from the develop branch, then install it directly from the repository:
pip install git+git://github.com/phageaisa/phageai.git@develop
PhageAI execution (3/3)
PASTE_YOUR_ACCESS_TOKEN_HERE
- PhageAI web user's access token;
PASTE_YOUR_FASTA_PATH_HERE
- FASTA filename with *.fasta or *.fa extension;
Example I - single phage prediction
from phageai.lifecycle.classifier import LifeCycleClassifier
lcc = LifeCycleClassifier(access_token='PASTE_YOUR_ACCESS_TOKEN_HERE')
lcc.predict(fasta_path='PASTE_YOUR_FASTA_PATH_HERE')
Expected output for MG945357.fasta
bacateriophage sample:
{
"model_class_label": "Virulent",
"prediction_accuracy": "98.94",
"gc": "39.47",
"sequence_length": 4915
}
or, if you reach out daily API requests limit, you can expect:
{
"author": ["Your daily API limit (100 requests) has been exceeded"]
}
Example II - prediction for directory with phages
import os
import csv
from pathlib import Path
from phageai.lifecycle.classifier import LifeCycleClassifier
lcc = LifeCycleClassifier(access_token='PASTE_YOUR_ACCESS_TOKEN_HERE')
# Be aware that directory have to includes *.fasta files only
phage_dir_path = Path('PASTE_YOUR_DIRECTORY_NAME_WITH_FASTA_FILES')
phage_directory = os.listdir(phage_dir_path)
prediction_results = {}
for single_fasta_file in phage_directory:
try:
prediction_results[single_fasta_file] = lcc.predict(fasta_path=phage_dir_path / single_fasta_file)
except Exception as e:
print(f'[PhageAI] Phage {single_fasta_file} raised an exception "{e}"')
# Python dict with prediction results
for fasta, phageai in prediction_results.items():
print(fasta, phageai)
# Prepare CSV report as a final result
csv_columns = [
'fasta_name', 'model_class_label', 'prediction_accuracy',
'gc', 'sequence_length'
]
# CSV file name
csv_file = "phageai_report.csv"
with open(csv_file, 'w') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=csv_columns)
writer.writeheader()
for fasta_name, phage_data in zip(prediction_results.keys(), prediction_results.values()):
phage_data["fasta_name"] = fasta_name
writer.writerow(phage_data)
Example III - get bacteriophage meta-data and top-10 similar samples from PhageAI
from phageai.repository.phages import BacteriophageRepository
phageai_repo = BacteriophageRepository(access_token='PASTE_YOUR_ACCESS_TOKEN_HERE')
# Get bacteriophage meta-data based on accession number (or hash value)
# It can return one or more than one results
phageai_repo.get_record(value='MZ375324')
# Get top 10 similar bacteriophage samples
phageai_repo.get_top10_similar_phages(value='MZ375324')
We will share numerous examples of using the package in Jupyter Notebook format (*.ipynb) soon.
Benchmark
PhageAI lifecycle classifier was benchmarked with DeePhage, bacphlip, VIBRANT and PHACTS tools using 91 Virulent and Temperate bacteriophages from our paper (testing set). Correct predictions results:
Tool | Version | Chronic support | No. viruses' genomes | Test set accuracy | DOI |
---|---|---|---|---|---|
PhageAI | 1.5 | Yes | 17 559 | 98,90 | This research |
DeePhage | 1.0 | No | 1 640 | N/A | 10.1093/gigascience/giab056 |
bacphlip | 0.9.6 | No | 1 057 | 100 | 10.7717/peerj.11396 |
VIBRANT | 1.2.1 | No | 350 626 | 92,31 | 10.1186/s40168-020-00867-0 |
PHACTS | 0.3 | No | 227 | 89,13 | 10.1093/bioinformatics/bts014 |
Community and Contributions
Happy to see you willing to make the PhageAI better. Development on the latest stable version of Python 3+ is preferred. As of this writing it's 3.8. You can use any operating system.
If you're fixing a bug or adding a new feature, add a test with pytest and check the code with Black and mypy. Before adding any large feature, first open an issue for us to discuss the idea with the core devs and community.
Have a question?
Obviously if you have a private question or want to cooperate with us, you can always reach out to us directly by mail.
Found a bug?
Feel free to add a new issue with a respective title and description on the the PhageAI repository. If you already found a solution to your problem, we would be happy to review your pull request.
Team
Core Developers and Domain Experts who contributing to PhageAI:
- Piotr Tynecki
- Łukasz Wałejko
- Joanna Kazimierczak
- Arkadiusz Guziński
- Bogumił Zimoń
Change log
The log's will become rather long. It moved to its own file.
See CHANGELOG.md.
License
The PhageAI package is released under the under terms of the MIT License.
Cite
PhageAI - Bacteriophage Life Cycle Recognition with Machine Learning and Natural Language Processing
Tynecki, P.; Guziński, A.; Kazimierczak, J.; Jadczuk, M.; Dastych, J.; Onisko, A.
Bioinformatics 2020, DOI: 10.1101/2020.07.11.198606
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.