Skip to main content

Detect Biosynthetic Gene Clusters (BGCs) in HiFi metagenomic data

Project description

HiFiBGC

HiFiBGC is a tool to detect Biosynthetic Gene Clusters (BGCs) in PacBio HiFi metagenomic data.

Installation

Option 1: mamba

mamba create -n hifibgc -c conda-forge -c bioconda -c amityadav -y hifibgc

mamba activate hifibgc

mamba is preferred over below conda as it takes much lesser time and consumes lesser memory (RAM).
mamba can be installed from here.

Option 2: conda

conda create -n hifibgc -c conda-forge -c bioconda -c amityadav -y hifibgc

conda activate hifibgc

HiFiBGC uses following third-party tools: hifiasm-meta, metaFlye, HiCanu, Minimap2, SAMtools, antiSMASH, BiG-SCAPE, complex-upsetplot, Snaketool, Snaketool-utils

Usage

Install prerequisites

Below command need to be run only once. It installs a required database and a tool.

hifibgc install

Run on test data

Test installation of HiFiBGC on a small dataset using below command.

hifibgc test

On successful completion of above command, you should see something like Snakemake finished successfully on terminal and an output directory hifibgc1.out.

Run on real data

Run HiFiBGC with default options with a required input (.fastq) file:

hifibgc run --input input.fastq  

Specify output directory and no of threads:

hifibgc run --input input.fastq --output outdir --threads 50

Specify bigscape_cutoff option:

hifibgc run --input input.fastq --bigscape_cutoff 0.3

Output

The output directory from HiFiBGC contains following folders and files.

.
└── hifibgc1.out
    ├── 01_assembly --> Output from three assemblers
    ├── 02_mapping_reads_to_merged_assembly --> Read mapping to concatenated assembly and extraction of unmapped reads 
    ├── 03_antismash --> BGC prediction
    ├── 04_bgc_clustering --> BGC clustering
    ├── 05_final_output --> Primary output of HiFiBGC
    ├── benchmarks --> Resource usage and time consumption by different components of HiFiBGC
    ├── config.yaml --> Configuration file for HiFiBGC run
    ├── hifibgc.log --> Snakemake log file
    └── logs --> Logs associated with different tools used in HiFiBGC

Among above, the folder 05_final_output contains primary output of HiFiBGC, specifically following folders and files.

├── 05_final_output
│   ├── BGC_all --> Folder containing all BGC .gbk files
│   ├── BGC_all_metadata.tsv --> File containing metadata associated with all BGCs
│   ├── BGC_representative --> Folder containing representative BGC .gbk files
│   ├── upsetplot --> Upsetplot comparison of results from three assemblers and unmapped reads

Commands

$hifibgc --help

Usage: hifibgc [OPTIONS] COMMAND [ARGS]...

  Detect Biosynthetic Gene Clusters (BGCs) in HiFi metagenomic data. For
  more options, run: hifibgc command --help

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  run       Run HiFiBGC
  install   Install required database and tool
  test      Test HiFiBGC
  config    Copy the system default config file
  citation  Print the citation(s) for this tool

$hifibgc run --help

Usage: hifibgc run [OPTIONS] [SNAKE_ARGS]...

  Run HiFiBGC

Options:
  --input TEXT                  Input file  [required]
  --output PATH                 Output directory  [default: hifibgc1.out]
  --bigscape_cutoff FLOAT       BiG-SCAPE cutoff parameter  [default: 0.3]
  --configfile TEXT             Custom config file [default:
                                (outputDir)/config.yaml]
  --threads INTEGER             Number of threads to use  [default: 80]
  --use-conda / --no-use-conda  Use conda for Snakemake rules  [default: use-
                                conda]
  --conda-prefix PATH           Custom conda env directory
  --snake-default TEXT          Customise Snakemake runtime args  [default:
                                --rerun-incomplete, --printshellcmds,
                                --nolock, --show-failed-logs]
  -h, --help                    Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hifibgc-0.1.14.tar.gz (28.3 MB view details)

Uploaded Source

File details

Details for the file hifibgc-0.1.14.tar.gz.

File metadata

  • Download URL: hifibgc-0.1.14.tar.gz
  • Upload date:
  • Size: 28.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for hifibgc-0.1.14.tar.gz
Algorithm Hash digest
SHA256 8ce7d663d2093e43e7b2945e9cdd16cb41fc8db6f94df267e4bb24aec6cd5ee4
MD5 0be939b778bc017e63ff054c66d611ef
BLAKE2b-256 7ef8ef7065f93ec9e32c3267ae6738a371b693ed16168d3acf536af0a4c02a10

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page