Skip to main content

Discovery and Extraction of Phages Tool

Project description

Detection and Extraction of Phages Tool (DEPhT)

DEPhT is a new tool for identifying prophages in bacteria, with a particular focus on Mycobacteria. It uses two cheap features to identify regions likely to contain prophages:

  1. Local average length of genes + intergenic regions, in 55-gene windows
  2. Local number of strand changes, in 55-gene windows

Each gene is assigned a probability of belonging to a prophage. In general if a gene occurs in a 55-gene window where the average gene size is <800 bp and there are fewer than 10 strand changes, there is a very good chance it belongs to a prophage.

This approach is not perfect, so another cheap approach is taken to improve the specificity. MMseqs2 is used to cluster each gene against a database of clade-specific Mycobacterial core genes, to 50% identity, 80% coverage, e-value 0.001. Regions of high likelihood prophage genes designated as non-core are taken as high probability prophages.

Depending on the selected runmode, these prophage regions are further scrutinized by functionally annotating them against HMMs of manually annotated mycobacteriophage phamilies from the Actino_Draft Phamerator database. Predicted prophages with too few high-probability hits into these HMMs may be culled as unlikely prophages.

Finally, the remaining prophages are subjected to a blastn-based attL/attR detection scheme that gives DEPhT superior edge detection than any tool we are aware of.

Installation

While DEPhT can be installed and run by manually compiling each of its dependencies, by far the easiest approach is to use Anaconda:

conda create -n depht python=3.9 -y && conda activate depht
conda install -c bioconda -c conda-forge prodigal aragorn mmseqs2=13.45111 hhsuite=3 blast=2.9 -y
git clone https://github.com/chg60/DEPhT && cd DEPhT
pip install -r requirements.txt
cd src

From there, print DEPhT's help menu by running it as a python module without arguments:

python3 -m depht

Which shows something like this (default # CPUs will vary from computer to computer):

usage: __main__.py [-h] -i INFILE [INFILE ...] [-f {fasta,genbank}] -o OUTDIR [-c CPUS] [-n] [-m {fast,normal,strict}]
                   [-s ATT_SENSITIVITY] [-d] [-v] [-t TMP_DIR] [-p PRODUCT_THRESHOLD] [-l LENGTH_THRESHOLD]

DEPhT scans bacterial genomes looking for prophages. Regions identified as prophage candidates are further scrutinized, and
attachment sites identified as accurately as possible before prophage extraction and generating the final report.

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE [INFILE ...], --infile INFILE [INFILE ...]
                        path to genome file(s) to scan for prophages
  -f {fasta,genbank}, --input-format {fasta,genbank}
                        input which format your input file(s) are in
  -o OUTDIR, --outdir OUTDIR
                        path where outputs can be written
  -c CPUS, --cpus CPUS  number of CPU cores to use [default: 4]
  -n, --no-draw         don't draw genome diagram for identified prophage(s)
  -m {fast,normal,strict}, --mode {fast,normal,strict}
                        select a runmode that favors speed or accuracy
  -s ATT_SENSITIVITY, --att_sensitivity ATT_SENSITIVITY
                        sensitivity parameter for att site detection.
  -d, --dump-data       dump all support data to outdir
  -v, --verbose         print progress messages as the program runs
  -t TMP_DIR, --tmp-dir TMP_DIR
                        temporary directory to use for file I/O [default: /tmp/prophicient]
  -p PRODUCT_THRESHOLD, --product-threshold PRODUCT_THRESHOLD
                        select a phage homolog product lower threshold
  -l LENGTH_THRESHOLD, --length-threshold LENGTH_THRESHOLD
                        select a minimum length for prophages [default: 20000]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

depht-1.0.0.tar.gz (57.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

depht-1.0.0-py3-none-any.whl (62.7 kB view details)

Uploaded Python 3

File details

Details for the file depht-1.0.0.tar.gz.

File metadata

  • Download URL: depht-1.0.0.tar.gz
  • Upload date:
  • Size: 57.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for depht-1.0.0.tar.gz
Algorithm Hash digest
SHA256 73eb7bd7d32d52ae5e0bdb62d8ce17cf59d68d633805075a876d6f3cf85da123
MD5 62b65b7ad4cf6deae4d74df80038f31b
BLAKE2b-256 9161e0c3c1e725b4a281c087467594e908e0a46e1a91ec288ac09b22c4a4ac17

See more details on using hashes here.

File details

Details for the file depht-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: depht-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 62.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for depht-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a375093b069cfcfc1a0d8cf6e37c3aecbc4b8cd717397b251190bf605d06a0a4
MD5 9c8520af6384db6abd23fb87b02c434b
BLAKE2b-256 1972fe63d35459233b43e179b3579a137c84b3586a1ec3f6674ce1340fadc2fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page