Bacterial assembly and annotation
Project description
Baczy
Overview
Baczy is a Snakemake-based workflow for assembling and annotating bacterial host genomes. It extends Sphae, which assembles and annotates phage genomes, by enabling host genome assembly and functional annotation.
🔹 Features:
✔ Quality control (Fastp)
✔ Genome assembly (MEGAHIT, Hybracter)
✔ Functional annotation (Bakta)
✔ Taxonomic classification (GTDB-Tk)
✔ Taxonomic tree (GTDB-Tk)
✔ Defense & resistance profiling (Defense-Finder, AMRFinderPlus, CapsuleDB)
✔ Prophage detection (PhiSpy)
✔ Pan-genome analysis (Panaroo)
Installion
Prerequisites installation
-
Install singularity or load the module On deepthought cluster
module load apptainer -
Install miniconda Download and install Miniconda: Miniconda Installation Guide
Steps for installing workflow
Setting up a conda environment
conda create -n baczy python=3.12
conda activate baczy
Using one of the methods below baczy can be downloaded within this environment
Option 1: Source Installation
#clone repository
git clone https://github.com/npbhavya/baczy.git
#move to sphae folder
cd baczy
#install
pip install -e .
#confirm the workflow is installed by running the below command
baczy --help
Option 2: Pip installation Note: This installation doesnt include singualrity/docker, so that has to be downloaded separately.
pip install baczy
Database setup
Download and place the required databases to a directory:
Set the databases path to variable $BACZY_DATABASE_PATH
export $BACZY_DATABASE_PATH=/home/user/database
Update the path /home/user/database to the correct database directory path
Running the workflow
Run Baczy using a single command!
Before starting the run The taxonomic tree is generated using GTDB-Tk, so update the lines
gtdbtk:
outgroup: "d__Archaea"
taxa_filter: "d__Bacteria"
This can be set to more specific genera:
gtdbtk:
outgroup: "g__Escherichia"
taxa_filter: "g__Achromobacter"
For paired end reads
baczy run --input sample-data/illumina --cores 32 --use-singularity --sdm apptainer --output test -k --use-conda
For long reads
baczy run --input sample-data/nanopore --sequencing longread --cores 32 -k --use-singularity --sdm apptainer --output test -k --use-conda
Intermediate files
Stored in:
baczy.out/PROCESSING
Final Results and Output
Stored in RESULT-short for short reads or RESULTS-long for long reads
Each folder contains:
- {sample} folder
- {sample}_amrfinderplus table: identified AMR genes in the genome
- {sample}_contigs.fa or {sample}_final.fasta : assembled genomes for each genome
- {sample}.faa : identified proteins
- {sample}.fna : identified genes
- {sample}.gbff
- {sample}.gff3
- {sample}.png and {sample}.svg : genome visualized
- {sample}.txt: summary
- {sample}_prophage_coordinates.tsv: location of the identified prophages using Phispy
- amrfinder_summary.tsv : a table with all the AMRFinder genes in all the samples
- bakta_summary.tsv : Bakta summary for all the samples saved to one table
- checkm2_quality_report.tsv : Checkm2 completenes results
- defensefinder_summary.tsv : All the defense systems found in all the samples
- gtdbtk.ba120_summary.tsv : GTDBTK summary with the predicted taxa for each of the samples
- gtdbtk.bac120.decorated.tree , gtdbtk.bac120.tree.table : GTDBTK tree and the tree table
- visualize the tree on iTOL
- prophage_regions.tsv : Location of the prophage regions in al the samples
- Panaroo folder
- output from running panaroo
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file baczy-1.0.3.tar.gz.
File metadata
- Download URL: baczy-1.0.3.tar.gz
- Upload date:
- Size: 6.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a95025aed39f7e01ff5fb8503b8f7dbf5753d630043f10b4621b2e42bf189558
|
|
| MD5 |
008f20d600ea1954184b777b5537499a
|
|
| BLAKE2b-256 |
2e76abb69f4fd3fca0e78b7e4309f3540198a54d2470eab760d22051ae914d1b
|
File details
Details for the file baczy-1.0.3-py3-none-any.whl.
File metadata
- Download URL: baczy-1.0.3-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23567a46b6dfd61d451f34ca7f5b5b3985052de214a6bf4b2621c3e950a50e4e
|
|
| MD5 |
7443f2f57609632cfdede9a9a2c6e2a5
|
|
| BLAKE2b-256 |
94eb01fd3c8a6f83daaee54e63c1e052c2cade15a0049ad52db59445c70d5a52
|