Skip to main content

Bacterial assembly and annotation

Project description

Edwards Lab License: MIT GitHub language count GitHub last commit (branch) CI

install with pip Pip Downloads

Baczy

Overview

Baczy is a Snakemake-based workflow for assembling and annotating bacterial host genomes. It extends Sphae, which assembles and annotates phage genomes, by enabling host genome assembly and functional annotation.

🔹 Features:
Quality control (Fastp)
Genome assembly (MEGAHIT, Hybracter)
Functional annotation (Bakta)
Taxonomic classification (GTDB-Tk)
Taxonomic tree (GTDB-Tk) ✔ Defense & resistance profiling (Defense-Finder, AMRFinderPlus, CapsuleDB)
Prophage detection (PhiSpy)
Pan-genome analysis (Panaroo)

Installion

Prerequisites installation

  • Install singularity or load the module On deepthought cluster
    module load apptainer

  • Install miniconda Download and install Miniconda: Miniconda Installation Guide

Steps for installing workflow

Setting up a conda environment

conda create -n baczy python=3.12
conda activate baczy

Using one of the methods below baczy can be downloaded within this environment

Option 1: Source Installation

#clone repository
git clone https://github.com/npbhavya/baczy.git

#move to sphae folder
cd baczy

#install
pip install -e .

#confirm the workflow is installed by running the below command 
baczy --help

Option 2: Pip installation Note: This installation doesnt include singualrity/docker, so that has to be downloaded separately.

pip install baczy

Database setup

Download and place the required databases to a directory:

Set the databases path to variable $BACZY_DATABASE_PATH

export $BACZY_DATABASE_PATH=/home/user/database

Update the path /home/user/database to the correct database directory path

Running the workflow

Run Baczy using a single command!

Before starting the run The taxonomic tree is generated using GTDB-Tk, so update the lines

gtdbtk:
  outgroup: "d__Archaea"
  taxa_filter: "d__Bacteria"

This can be set to more specific genera:

gtdbtk:
  outgroup: "g__Escherichia"
  taxa_filter: "g__Achromobacter"

For paired end reads

baczy run --input sample-data/illumina --cores 32 --use-singularity --sdm apptainer --output test -k --use-conda

For long reads

baczy run --input sample-data/nanopore --sequencing longread --cores 32 -k --use-singularity --sdm apptainer --output test -k --use-conda

Intermediate files

Stored in:

baczy.out/PROCESSING

Final Results and Output

Stored in RESULT-short for short reads or RESULTS-long for long reads

Each folder contains:

  • {sample} folder
    • {sample}_amrfinderplus table: identified AMR genes in the genome
    • {sample}_contigs.fa or {sample}_final.fasta : assembled genomes for each genome
    • {sample}.faa : identified proteins
    • {sample}.fna : identified genes
    • {sample}.gbff
    • {sample}.gff3
    • {sample}.png and {sample}.svg : genome visualized
    • {sample}.txt: summary
    • {sample}_prophage_coordinates.tsv: location of the identified prophages using Phispy
  • amrfinder_summary.tsv : a table with all the AMRFinder genes in all the samples
  • bakta_summary.tsv : Bakta summary for all the samples saved to one table
  • checkm2_quality_report.tsv : Checkm2 completenes results
  • defensefinder_summary.tsv : All the defense systems found in all the samples
  • gtdbtk.ba120_summary.tsv : GTDBTK summary with the predicted taxa for each of the samples
  • gtdbtk.bac120.decorated.tree , gtdbtk.bac120.tree.table : GTDBTK tree and the tree table
    • visualize the tree on iTOL
  • prophage_regions.tsv : Location of the prophage regions in al the samples
  • Panaroo folder

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baczy-1.0.3.tar.gz (6.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

baczy-1.0.3-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file baczy-1.0.3.tar.gz.

File metadata

  • Download URL: baczy-1.0.3.tar.gz
  • Upload date:
  • Size: 6.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for baczy-1.0.3.tar.gz
Algorithm Hash digest
SHA256 a95025aed39f7e01ff5fb8503b8f7dbf5753d630043f10b4621b2e42bf189558
MD5 008f20d600ea1954184b777b5537499a
BLAKE2b-256 2e76abb69f4fd3fca0e78b7e4309f3540198a54d2470eab760d22051ae914d1b

See more details on using hashes here.

File details

Details for the file baczy-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: baczy-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for baczy-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 23567a46b6dfd61d451f34ca7f5b5b3985052de214a6bf4b2621c3e950a50e4e
MD5 7443f2f57609632cfdede9a9a2c6e2a5
BLAKE2b-256 94eb01fd3c8a6f83daaee54e63c1e052c2cade15a0049ad52db59445c70d5a52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page