Skip to main content

GraphBin: Refined binning of metagenomic contigs using assembly graphs.

Project description

GraphBin logo GraphBin logo

GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs

DOI install with bioconda Conda Conda PyPI version Downloads CI codecov Code style: black CodeQL Documentation Status

GraphBin is an NGS data-based metagenomic contig bin refinement tool that makes use of the contig connectivity information from the assembly graph to bin contigs. It utilizes the binning result of an existing binning tool and a label propagation algorithm to correct mis-binned contigs and predict the labels of contigs which are discarded due to short length.

For detailed instructions on installation, usage and visualisation, please refer to the documentation hosted at Read the Docs.

Dependencies

GraphBin installation requires python 3 to run. The following dependencies are required to run GraphBin and related support scripts.

Installing GraphBin

Using Conda

You can install GraphBin using the bioconda distribution. You can download Anaconda or Miniconda which contains conda.

# add channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# create conda environment
conda create -n graphbin

# activate conda environment
conda activate graphbin

# install graphbin
conda install -c bioconda graphbin

# check graphbin installation
graphbin -h

Using pip

You can install GraphBin using pip from the PyPI distribution.

pip install graphbin

For development purposes, please clone the repository and install via flit.

# clone repository to your local machine
git clone https://github.com/metagentools/GraphBin.git

# go to repo directory
cd GraphBin

# install flit
pip install flit

# install graphbin via flit
flit install -s --python `which python`

Example Usage

# SPAdes version
graphbin --assembler spades --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fasta --paths /path/to/paths_file.paths --binned /path/to/binning_result.csv --output /path/to/output_folder

# SGA version
graphbin --assembler sga --graph /path/to/graph_file.asqg --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

# MEGAHIT version
graphbin --assembler megahit --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

Visualization of the Assembly Graph of ESC+metaSPAdes Test Dataset

Initial Assembly Graph

Initial assembly graph

TAXAassign Labelling

TAXAassign Labelling

Original MaxBin Labelling with 2 Mis-binned Contigs

MaxBin Labelling

Refined Labels

Refined Labels

Final Labelling of GraphBin

Final Labelling

Citation

If you use GraphBin in your work, please cite GraphBin as,

Vijini Mallawaarachchi, Anuradha Wickramarachchi, Yu Lin. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3307–3313, DOI: https://doi.org/10.1093/bioinformatics/btaa180

@article{10.1093/bioinformatics/btaa180,
    author = {Mallawaarachchi, Vijini and Wickramarachchi, Anuradha and Lin, Yu},
    title = "{GraphBin: refined binning of metagenomic contigs using assembly graphs}",
    journal = {Bioinformatics},
    volume = {36},
    number = {11},
    pages = {3307-3313},
    year = {2020},
    month = {03},
    abstract = "{The field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. One key step in metagenomics analysis is to assemble reads into longer contigs which are then binned into groups of contigs that belong to different species present in the metagenomic sample. Binning of contigs plays an important role in metagenomics and most available binning algorithms bin contigs using genomic features such as oligonucleotide/k-mer composition and contig coverage. As metagenomic contigs are derived from the assembly process, they are output from the underlying assembly graph which contains valuable connectivity information between contigs that can be used for binning. We propose GraphBin, a new binning method that makes use of the assembly graph and applies a label propagation algorithm to refine the binning result of existing tools. We show that GraphBin can make use of the assembly graphs constructed from both the de Bruijn graph and the overlap-layout-consensus approach. Moreover, we demonstrate improved experimental results from GraphBin in terms of identifying mis-binned contigs and binning of contigs discarded by existing binning tools. To the best of our knowledge, this is the first time that the information from the assembly graph has been used in a tool for the binning of metagenomic contigs. The source code of GraphBin is available at https://github.com/Vini2/GraphBin.vijini.mallawaarachchi@anu.edu.au or yu.lin@anu.edu.auSupplementary data are available at Bioinformatics online.}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa180},
    url = {https://doi.org/10.1093/bioinformatics/btaa180},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/36/11/3307/33329097/btaa180.pdf},
}

Funding

GraphBin is funded by an Essential Open Source Software for Science Grant from the Chan Zuckerberg Initiative.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphbin-1.7.2.tar.gz (845.9 kB view details)

Uploaded Source

Built Distribution

graphbin-1.7.2-py3-none-any.whl (55.5 kB view details)

Uploaded Python 3

File details

Details for the file graphbin-1.7.2.tar.gz.

File metadata

  • Download URL: graphbin-1.7.2.tar.gz
  • Upload date:
  • Size: 845.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for graphbin-1.7.2.tar.gz
Algorithm Hash digest
SHA256 d98e4b5e52fccec045397b2f1c9c1234a160f56e201784d39c4c1e17a403e18b
MD5 f87127632fd589a71823ed4ad0ed09ea
BLAKE2b-256 1a72480febe773af86be40cd63d74690d90cfd3c570bd27346045cafe8bc629d

See more details on using hashes here.

File details

Details for the file graphbin-1.7.2-py3-none-any.whl.

File metadata

  • Download URL: graphbin-1.7.2-py3-none-any.whl
  • Upload date:
  • Size: 55.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for graphbin-1.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ebf645dff9386476f8b92203af27bf7c208ddc9423a8048e6ebde540b5569859
MD5 5d546a370d47b894cde29de3b52b9b01
BLAKE2b-256 07ad80d4494e5ed4c2dd839f6d3b51f4389bdc3dfc64972a67d6ca7d6afb02d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page