Skip to main content

Python module for running Defense Predictor, a machine learning model to predict antiphage defense systems

Project description

DefensePredictor: A Machine Learning Model to Discover Novel Prokaryotic Immune Systems

Python package to run DefensePredictor, a machine-learning model that leverages embeddings from a protein language model, ESM2, to classify proteins as anti-phage defensive.

For additional details, read the paper here.

Installation

In a fresh conda or other virutal environment, run:

pip install defense_predictor
defense_predictor_download

The first command downloads the python package from PyPI and the second downloads the model weights. Once model weights are downloaded you do not need to run this command again.

Requirements

Requires python >= 3.10

Usage

defense_predictor can be run as python code

import defense_predictor as dfp

ncbi_feature_table = 'GCF_003333385.1_ASM333338v1_feature_table.txt'
ncbi_cds_from_genomic = 'GCF_003333385.1_ASM333338v1_cds_from_genomic.fna'
ncbi_protein_fasta = 'GCF_003333385.1_ASM333338v1_protein.faa'
output_df, feature_matrix = dfp.defense_predictor(ft_file=ncbi_feature_table, fna_file=ncbi_cds_from_genomic, faa_file=ncbi_protein_fasta)
output_df.head()                                    

Or from the command line

defense_predictor \
     --ncbi_feature_table GCF_003333385.1_ASM333338v1_feature_table.txt \
     --ncbi_cds_from_genomic GCF_003333385.1_ASM333338v1_cds_from_genomic.fna \ 
     --ncbi_protein_fasta GCF_003333385.1_ASM333338v1_protein.faa \
     --output GCF_003333385_defense_predictor_output.csv

Alternatively, defense_predictor can take a single PGAP GFF3 file with embedded genomic FASTA (PGAP's annot_with_genomic_fasta.gff output):

output_df, feature_matrix = dfp.defense_predictor(pgap_gff='annot_with_genomic_fasta.gff')
defense_predictor \
     --pgap_gff annot_with_genomic_fasta.gff \
     --output defense_predictor_output.csv

When given a PGAP GFF, defense_predictor translates proteins from the embedded genomic sequence using the bacterial codon table (transl_table=11) and uses each CDS's locus_tag as its identifier.


defense_predictor outputs the predicted log-odds of defense for each input protein in the columns mean_log_odds. We reccomend using a stringent log-odds cutoff of 4 to call a protein predicted defensive.

To see an example you can run the defense_predictor_example.ipynb in colab: Open In Colab

We reccomend running defense_predictor on a computer with a cuda-enabled GPU, to maximize computational efficiency.

Inputs

The NCBI input files can be downloaded from the ftp webpage for any gemone of interest, which is linked on its assembly page.

For an unannotated nucleotide assembly, run NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) and pass its annot_with_genomic_fasta.gff output directly via --pgap_gff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

defense_predictor-1.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

defense_predictor-1.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file defense_predictor-1.1.0.tar.gz.

File metadata

  • Download URL: defense_predictor-1.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.1 Darwin/25.2.0

File hashes

Hashes for defense_predictor-1.1.0.tar.gz
Algorithm Hash digest
SHA256 b22e7923d8c0ca160458d53e90c8acc5835c8851c15ba0de978c7c27baccaab2
MD5 0f632acd1a279bce2a8ea3a830d3ad67
BLAKE2b-256 84192ce5d56f4b207b1d626b0d35e2913b43ef35111a4219d4ed8d8da802d067

See more details on using hashes here.

File details

Details for the file defense_predictor-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: defense_predictor-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.1 Darwin/25.2.0

File hashes

Hashes for defense_predictor-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69d4bd2372d20b69a99ad57f2cec6976f4059b2705b7bd713121c03db948a236
MD5 bc01048a188602b9f46e25383c6ecf45
BLAKE2b-256 3dca76081079a7f83b57909bfe8d94207e95f633a349e1d777b7cb2698c5f01b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page