Skip to main content

Detect Biosynthetic Gene Clusters (BGCs) in HiFi metagenomic data

Project description

HiFiBGC

HiFiBGC is a tool to detect Biosynthetic Gene Clusters (BGCs) in PacBio HiFi metagenomic data.

Installation

Option 1: mamba

mamba create -n hifibgc -c conda-forge -c bioconda -c amityadav -y hifibgc

mamba activate hifibgc

mamba is preferred over below conda as it takes much lesser time and consumes lesser memory (RAM).
mamba can be installed from here.

Option 2: conda

conda create -n hifibgc -c conda-forge -c bioconda -c amityadav -y hifibgc

conda activate hifibgc

HiFiBGC uses following third-party tools: hifiasm-meta, metaFlye, HiCanu, Minimap2, SAMtools, antiSMASH, BiG-SCAPE, complex-upsetplot, Snaketool, Snaketool-utils

Usage

Install prerequisites

Below command need to be run only once. It installs a required database and a tool.

hifibgc install

Run on test data

Test installation of HiFiBGC on a small dataset using below command.

hifibgc test

On successful completion of above command, you should see something like Snakemake finished successfully on terminal and an output directory hifibgc1.out.

Run on real data

Run HiFiBGC with default options with a required input (.fastq) file:

hifibgc run --input input.fastq  

Specify output directory and no of threads:

hifibgc run --input input.fastq --output outdir --threads 50

Specify bigscape_cutoff option:

hifibgc run --input input.fastq --bigscape_cutoff 0.3

Output

The output directory from HiFiBGC contains following folders and files.

.
└── hifibgc1.out
    ├── 01_assembly --> Output from three assemblers
    ├── 02_mapping_reads_to_merged_assembly --> Read mapping to concatenated assembly and extraction of unmapped reads 
    ├── 03_antismash --> BGC prediction
    ├── 04_bgc_clustering --> BGC clustering
    ├── 05_final_output --> Primary output of HiFiBGC
    ├── benchmarks --> Resource usage and time consumption by different components of HiFiBGC
    ├── config.yaml --> Configuration file for HiFiBGC run
    ├── hifibgc.log --> Snakemake log file
    └── logs --> Logs associated with different tools used in HiFiBGC

Among above, the folder 05_final_output contains primary output of HiFiBGC, specifically following folders and files.

├── 05_final_output
│   ├── BGC_all --> Folder containing all BGC .gbk files
│   ├── BGC_all_metadata.tsv --> File containing metadata associated with all BGCs
│   ├── BGC_representative --> Folder containing representative BGC .gbk files
│   ├── upsetplot --> Upsetplot comparison of results from three assemblers and unmapped reads

Commands

$hifibgc --help

Usage: hifibgc [OPTIONS] COMMAND [ARGS]...

  Detect Biosynthetic Gene Clusters (BGCs) in HiFi metagenomic data. For
  more options, run: hifibgc command --help

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  run       Run HiFiBGC
  install   Install required database and tool
  test      Test HiFiBGC
  config    Copy the system default config file
  citation  Print the citation(s) for this tool

$hifibgc run --help

Usage: hifibgc run [OPTIONS] [SNAKE_ARGS]...

  Run HiFiBGC

Options:
  --input TEXT                  Input file  [required]
  --output PATH                 Output directory  [default: hifibgc1.out]
  --bigscape_cutoff FLOAT       BiG-SCAPE cutoff parameter  [default: 0.3]
  --configfile TEXT             Custom config file [default:
                                (outputDir)/config.yaml]
  --threads INTEGER             Number of threads to use  [default: 80]
  --use-conda / --no-use-conda  Use conda for Snakemake rules  [default: use-
                                conda]
  --conda-prefix PATH           Custom conda env directory
  --snake-default TEXT          Customise Snakemake runtime args  [default:
                                --rerun-incomplete, --printshellcmds,
                                --nolock, --show-failed-logs]
  -h, --help                    Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hifibgc-0.1.14a4.tar.gz (28.4 MB view details)

Uploaded Source

File details

Details for the file hifibgc-0.1.14a4.tar.gz.

File metadata

  • Download URL: hifibgc-0.1.14a4.tar.gz
  • Upload date:
  • Size: 28.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for hifibgc-0.1.14a4.tar.gz
Algorithm Hash digest
SHA256 25edd632c794a71350b8e0282156c9947abf258430bb4a920feed65a98b13851
MD5 c967b442f82f7fa54b0d3376db4bae1a
BLAKE2b-256 b543a1ec4bbb28c201a899f2ad9aea36212cba14f2b04111debd6867e153dfe9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page