Python module for running Defense Predictor, a machine learning model to predict antiphage defense systems
Project description
DefensePredictor: A Machine Learning Model to Discover Novel Prokaryotic Immune Systems
Python package to run DefensePredictor, a machine-learning model that leverages embeddings from a protein language model, ESM2, to classify proteins as anti-phage defensive.
Installation
In a fresh conda or other virutal environment, run:
pip install defense_predictor
defense_predictor_download
The first command downloads the python package and the second command downloads the model weights. Once model weights are downloaded you do not need to run this command again.
Requirements
Requires python >= 3.10
Usage
defense_predictor can be run as python code
import defense_predictor as dfp
ncbi_feature_table = 'GCF_003333385.1_ASM333338v1_feature_table.txt'
ncbi_cds_from_genomic = 'GCF_003333385.1_ASM333338v1_cds_from_genomic.fna'
ncbi_protein_fasta = 'GCF_003333385.1_ASM333338v1_protein.faa'
output_df = dfp.run_defense_predictor(ncbi_feature_table=ncbi_feature_table,
ncbi_cds_from_genomic=ncbi_cds_from_genomic,
ncbi_protein_fasta=ncbi_protein_fasta)
output_df.head()
Or from the command line
defense_predictor \
--ncbi_feature_table GCF_003333385.1_ASM333338v1_feature_table.txt \
--ncbi_cds_from_genomic GCF_003333385.1_ASM333338v1_cds_from_genomic.fna \
--ncbi_protein_fasta GCF_003333385.1_ASM333338v1_protein.faa \
--output GCF_003333385_defense_predictor_output.csv
defense_predictor outputs the predicted probability and log-odds of defense for each input protein. We reccomend using a stringent log-odds cutoff of 7.2 to call a protein predicted defensive.
To see an example you can run the defense_predictor_example.ipynb in colab:
We reccomend running defense_predictor on a computer with a cuda-enabled GPU, to maximize computational efficiency.
Inputs
Input files can be downloaded from the ftp webpage for any gemone of interest, which is linked on its assembly page. Input files can be generated from an unannotated nuceotide assembly using NCBI's Prokaryotic Genome Annotation Pipeline.
Alternatively, defense_predictor accepts inputs generated from prokka using the arguments prokka_gff, prokka_ffn, and prokka_faa.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file defense_predictor-0.1.1.tar.gz.
File metadata
- Download URL: defense_predictor-0.1.1.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
740e44b51a8b8124ee5fa076a4690bb7bca528deffa736bd75d0c7568661b1eb
|
|
| MD5 |
6115a0e927333bdf58fd5fcc5b9229f5
|
|
| BLAKE2b-256 |
33aada5e9e8b2f48069ae6b362abca1c21cf44f3a51366cc196e06bb8414591e
|
File details
Details for the file defense_predictor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: defense_predictor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9affed108c80bf89bf6ba10664459214e144a66f41c62047644dd7f2d5685dcd
|
|
| MD5 |
4ddbf56342638e3bf7f223936806b7be
|
|
| BLAKE2b-256 |
2297e84fe1c13e84bd7d96f15482e7455aee8378c023393155d3efe5187a0a08
|