Skip to main content

An easy-to-use pipeline for the assembly and analysis of bacterial genomes

Project description

About

This repository contains an easy-to-use pipeline for the assembly and analysis of bacterial genomes using ONT long-read or Illumina short-read technology.

Introduction

Advances in sequencing technology during the COVID-19 pandemic has led to massive increases in the generation of sequencing data. Many bioinformatics tools have been developed to analyze this data, but very few tools can be utilized by individuals without prior bioinformatics training.

This pipeline was designed to encapsulate pre-existing tools to automate analysis of whole genome sequencing of bacteria. Installation is fast and straightfoward. The pipeline is easy to setup and contains rationale defaults, but is highly modular and configurable by more advance users. A successful run generates consensus sequences, typing information, phylogenetic tree, and quality control report.

Features

We anticipate the pipeline will be able to perform the following functions:

  • Reference-based assembly of Illumina paired-end reads
  • De novo assembly of Illumina paired-end reads
  • De novo assembly of ONT long reads
  • Run quality control checks
  • Variant calling using bcftools
  • Maximum-likelihood phylogenetic inference of processed samples and background dataset using iqtree
  • MLST profiling and virulence factor detection
  • Antimicrobial resistance genes and plasmid detection

Installation

  1. Install miniconda by running the following two command:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
  1. Clone the repository:
git clone https://github.com/CholGen/bacpage.git
  1. Install and activate the pipeline's conda environment:
mamba env create -f environment.yaml
mamba activate bacpage
  1. Test the installation:
snakemake --configfile test/test.yaml --all-temp --cores 8

This command should run to completion without a problem. Please create an issue if this is not the case.

Usage

  1. Navigate to the pipeline's directory.
  2. Copy the example/ directory to create a directory specifically for each batch of samples.
cp example/ <your-project-directory-name>
  1. Place raw sequencing reads in the input/ directory of your project directory.
  2. Record the name and absolute path of raw sequencing reads in the sample_data.csv found within your project directory.
  3. Replace the values <your-project-directory-name> and <sequencing-directory> in config.yaml found within your project directory, with the absolute path of your project directory and pipeline directory, respectively.
  4. Determine how many cores are available on your computer:
cat /proc/cpuinfo | grep processor
  1. From the pipeline's directory, run the entire pipeline on your samples using the following command:
snakemake --configfile <your-project-directory-name>/config.yaml --cores <cores>

This will generate a consensus sequence in FASTA format for each of your samples and place them in <your-project-directory-name>/results/consensus_sequences/<sample>.masked.fasta. An HTML report containing alignment and quality metrics for your samples can be found at <your-project-directory-name>/results/reports/qc_report.html. A phylogeny comparing your sequences to the background dataset can be found at <your-project-directory-name>/results/phylogeny/phylogeny.tree

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bacpage-2023.11.10.1.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

bacpage-2023.11.10.1-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file bacpage-2023.11.10.1.tar.gz.

File metadata

  • Download URL: bacpage-2023.11.10.1.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for bacpage-2023.11.10.1.tar.gz
Algorithm Hash digest
SHA256 f448e59941c4cd0df58c10040702be2de9f864201150f44f5d64b4f64d7935d1
MD5 d0557b6b51f64b8a40938d46b42c2b21
BLAKE2b-256 b5592fe1cc34f3aa4387e4833072cacf24105507f0550203598265e51e2a1e37

See more details on using hashes here.

File details

Details for the file bacpage-2023.11.10.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bacpage-2023.11.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d7476ca13f7f55f3585ef4e446bcdd389bd177d3f8ffaabe309333fcd83ddb39
MD5 15f8ebf13b8388df9ded26cc9a9daf1a
BLAKE2b-256 c8a171740ff3aff85c4f84314a111ba048ec598e4d5517229ce1bf26b94b8b96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page