Skip to main content

Assembling pure culture phages from both Illumina and Nanopore sequencing technology

Project description

Edwards Lab DOI License: MIT GitHub language count GitHub last commit (branch) CI Upload Python Package

Sphae

Phage toolkit to detect phage candidates for phage therapy

Overview

This snakemake workflow was built using Snaketool [https://doi.org/10.1371/journal.pcbi.1010705], to assemble and annotate phage sequences. Currently this tool is being developed for phage genomes. The steps include,

  • Quality control that removes adaptor sequences, low quality reads and host contimanination (optional).
  • Assembly
  • Contig quality checks; read coverage, viral or not, completeness, and assembly graph components.
  • Phage genome annotation'
  • Annotation of the phage genome

Complete list of programs used for each step is mention in the sphae.CITATION file.

Install

Pre-requisites

  • gcc
  • conda
  • libgl1-mesa-dev (ubuntu- for Bandage)
  • libxcb-xinerama0 (ubuntu- for Bandage)

Install Setting up a new conda environment

conda create -n sphae python=3.11
conda activate sphae
conda install -n base -c conda-forge mamba #if you dont already have mamba installed

Steps for installing sphae workflow

git clone https://github.com/linsalrob/sphae.git
cd sphae
pip install -e .
#confirm the workflow is installed by running the below command 
sphae --help

Installing databases

Run command,

sphae install

Install the databases to a directory, sphae/workflow/databases

This workflow requires the

  • Pfam35.0 database to run viral_verify for contig classification.
  • CheckV database to test for phage completeness
  • Pharokka databases
  • Phyteny models

This step takes approximately 1hr 30min to install, and requires 9G of storage

Running the workflow

The command sphae run will run QC, assembly and annoation

Commands to run Only one command needs to be submitted to run all the above steps: QC, assembly and assembly stats

#For illumina reads, place the reads both forward and reverse reads to one directory
sphae run --input tests/data/illumina-subset --output example

#For nanopore reads, place the reads, one file per sample in a directory
sphae run --input tests/data/nanopore-subset --sequencing longread --output example 

#To run either of the commands on the cluster, add --profile slurm to the command. For instance here is the command for longreads/nanopore reads 
#Before running this below command, makse sure have slurm config files setup, here is a tutorial, https://fame.flinders.edu.au/blog/2021/08/02/snakemake-profiles-updated 
sphae run --input tests/data/nanopore-subset --preprocess longread --output example --profile slurm 

Output

  • Assmbled phage genome saved to "{outut-directory}/genome/{sample}/{sample}.fasta
  • Annotations of the phage genome are saved to "{outut-directory}/pharokka/phynteny/phynteny.gbk"

Issues and Questions

This is still a work in progress, so if you come across any issues or errors, report them under Issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sphae-1.1.tar.gz (53.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page