Skip to main content

An easy-to-use pipeline for the assembly and analysis of bacterial genomes

Project description

bacpage

This repository contains an easy-to-use pipeline for the assembly and analysis of bacterial genomes using ONT long-read or Illumina short-read technology. Read the complete documentation and instructions for bacpage and each of its functions here

Introduction

Advances in sequencing technology during the COVID-19 pandemic has led to massive increases in the generation of sequencing data. Many bioinformatics tools have been developed to analyze this data, but very few tools can be utilized by individuals without prior bioinformatics training.

This pipeline was designed to encapsulate pre-existing tools to automate analysis of whole genome sequencing of bacteria. Installation is fast and straightfoward. The pipeline is easy to setup and contains rationale defaults, but is highly modular and configurable by more advance users. Bacpage has individual commands to generate consensus sequences, perform de novo assembly, construct phylogenetic tree, and generate quality control reports.

Features

We anticipate the pipeline will be able to perform the following functions:

  • Reference-based assembly of Illumina paired-end reads
  • De novo assembly of Illumina paired-end reads
  • De novo assembly of ONT long reads
  • Run quality control checks
  • Variant calling using bcftools
  • Maximum-likelihood phylogenetic inference of processed samples and background dataset using iqtree
  • MLST profiling and virulence factor detection
  • Antimicrobial resistance genes detection
  • Plasmid detection

Installation

  1. Install mamba by running the following two command:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
  1. Clone the bacpage repository:
git clone https://github.com/CholGen/bacpage.git
  1. Switch to the development branch of the pipeline:
cd bacpage/
git checkout -b split_into_command
  1. Install and activate the pipeline's conda environment:
mamba env create -f environment.yaml
mamba activate bacpage
  1. Install the bacpage command:
pip install .
  1. Test the installation:
bacpage -h
bacpage version

These command should print the help and version of the program. Please create an issue if this is not the case.

Updating

  1. Navigate to the directory where you cloned the bacpage repository on the command line:
cd bacpage/
  1. Activate the bacpage conda environment:
mamba activate bacpage
  1. Pull the lastest changes from GitHub:
git pull
  1. Update the bacpage conda environemnt:
mamba env update -f environment.yaml
  1. Reinstall the bacpage command:
pip install .

Usage

  1. Activate the bacpage conda environment:
mamba activate bacpage
  1. Create a directory specifically for the batch of samples you would like to analyze (called a project directory).
bacpage setup [your-project-directory-name]
  1. Place paired sequencing reads in the input/ directory of your project directory.
  2. From the pipeline's directory, run the reference-based assembly pipeline on your samples using the following command:
bacpage assemble [your-project-directory-name]

This will generate a consensus sequence in FASTA format for each of your samples and place them in <your-project-directory-name>/results/consensus_sequences/<sample>.masked.fasta. An HTML report containing alignment and quality metrics for your samples can be found at <your-project-directory-name>/results/reports/qc_report.html.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bacpage-2024.3.8.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

bacpage-2024.3.8-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file bacpage-2024.3.8.tar.gz.

File metadata

  • Download URL: bacpage-2024.3.8.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for bacpage-2024.3.8.tar.gz
Algorithm Hash digest
SHA256 54b1194fd303bec7f042fde3d825c8253e31d63210d9bcb5a105a78dbb7e2c1c
MD5 b2ac5aadd9289f2b7a4356362cbfb532
BLAKE2b-256 559d386e385207e48fd23c5ed43ea79a1ed7c0f3dffcff2475088b5270104e1b

See more details on using hashes here.

File details

Details for the file bacpage-2024.3.8-py3-none-any.whl.

File metadata

  • Download URL: bacpage-2024.3.8-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for bacpage-2024.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 7b6b2d4e312f7768026f679cd5004cb24d2bce5596f1acca6f686eb53111be1d
MD5 53c4043be9a4ef4488c604d386bf8c2e
BLAKE2b-256 be2031714d7d254e4e2bdb190dab7a3b3b09c2ba42e3d1424130faf9ed7c50dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page