Viral metagenomics framework for short and longreads
Project description
A hecatomb is a great sacrifice or an extensive loss. Heactomb the software empowers an analyst to make data driven decisions to 'sacrifice' false-positive viral reads from metagenomes to enrich for true-positive viral reads. This process frequently results in a great loss of suspected viral sequences / contigs.
Contents
Documentation
Complete documentation is hosted at Read the Docs
Citation
Hecatomb is currently on BioRxiv!
Quick start guide
Running on HPC
Hecatomb is powered by Snakemake and greatly benefits from the use of Snakemake profiles for HPC Clusters. More information and example for setting up Snakemake profiles for Hecatomb in the documentation.
Install
# create conda env and install
conda create -n hecatomb -c conda-forge -c bioconda hecatomb
# activate conda env
conda activate hecatomb
# check the installation
hecatomb --help
# download the databases - you only have to do this once
# locally: using 8 threads (default is 32 threads)
hecatomb install --threads 8
# HPC: using a snakemake profile named 'slurm'
hecatomb install --profile slurm
Run the test dataset
# locally: uses 32 threads and 64 GB RAM by default
hecatomb test
# HPC: using a profile named 'slurm'
hecatomb test --profile slurm
Inputs
Hecatomb can process paired- or single-end short-read sequencing, longread sequencing, and paired-end sequencing for round A/B library protocol.
hecatomb run --library paired
hecatomb run --library single
hecatomb run --library longread
hecatomb run --library roundAB
When you specify a directory of reads with --reads
for paried-end sequencing,
Hecatomb expects paired-end sequencing reads in the format sampleName_R1/R2.fastq(.gz). e.g.
sample1_R1.fastq.gz
sample1_R2.fastq.gz
sample2_R1.fastq.gz
sample2_R2.fastq.gz
When you specify a TSV file with --reads
, Hecatomb expects a 2- or 3-column tab separated file (depending on
preprocessing method) with the first column specifying a sample name, and the other columns the relative or full paths
to the forward (and reverse) read files. e.g.
sample1 /path/to/reads/sample1.1.fastq.gz /path/to/reads/sample1.2.fastq.gz
sample2 /path/to/reads/sample2.1.fastq.gz /path/to/reads/sample2.2.fastq.gz
Dependencies
The only dependency you need to get up and running with Hecatomb is conda or the python package manager pip. Hecatomb relies on conda (and mamba) to ensure portability and ease of installation of its dependencies. All of Hecatomb's dependencies are installed during installation or runtime, so you don't have to worry about a thing!
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.