Skip to main content

GraphBin: Refined binning of metagenomic contigs using assembly graphs.

Project description

GraphBin logo

GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs

DOI install with bioconda Conda Conda PyPI version Downloads CI codecov Code style: black CodeQL Documentation Status

GraphBin is an NGS data-based metagenomic contig bin refinement tool that makes use of the contig connectivity information from the assembly graph to bin contigs. It utilizes the binning result of an existing binning tool and a label propagation algorithm to correct mis-binned contigs and predict the labels of contigs which are discarded due to short length.

For detailed instructions on installation, usage and visualisation, please refer to the documentation hosted at Read the Docs.

Dependencies

GraphBin installation requires python 3 to run. The following dependencies are required to run GraphBin and related support scripts.

Installing GraphBin

Using Conda

You can install GraphBin using the bioconda distribution. You can download Anaconda or Miniconda which contains conda.

# add channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# create conda environment
conda create -n graphbin

# activate conda environment
conda activate graphbin

# install graphbin
conda install -c bioconda graphbin

# check graphbin installation
graphbin -h

Using pip

You can install GraphBin using pip from the PyPI distribution.

pip install graphbin

For development purposes, please clone the repository and install via flit.

# clone repository to your local machine
git clone https://github.com/metagentools/GraphBin.git

# go to repo directory
cd GraphBin

# install flit
pip install flit

# install graphbin via flit
flit install -s --python `which python`

Example Usage

# SPAdes version
graphbin --assembler spades --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fasta --paths /path/to/paths_file.paths --binned /path/to/binning_result.csv --output /path/to/output_folder

# SGA version
graphbin --assembler sga --graph /path/to/graph_file.asqg --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

# MEGAHIT version
graphbin --assembler megahit --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

Visualization of the Assembly Graph of ESC+metaSPAdes Test Dataset

Initial Assembly Graph

Initial assembly graph

TAXAassign Labelling

TAXAassign Labelling

Original MaxBin Labelling with 2 Mis-binned Contigs

MaxBin Labelling

Refined Labels

Refined Labels

Final Labelling of GraphBin

Final Labelling

Citation

If you use GraphBin in your work, please cite GraphBin as,

Vijini Mallawaarachchi, Anuradha Wickramarachchi, Yu Lin. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3307–3313, DOI: https://doi.org/10.1093/bioinformatics/btaa180

@article{10.1093/bioinformatics/btaa180,
    author = {Mallawaarachchi, Vijini and Wickramarachchi, Anuradha and Lin, Yu},
    title = "{GraphBin: refined binning of metagenomic contigs using assembly graphs}",
    journal = {Bioinformatics},
    volume = {36},
    number = {11},
    pages = {3307-3313},
    year = {2020},
    month = {03},
    abstract = "{The field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. One key step in metagenomics analysis is to assemble reads into longer contigs which are then binned into groups of contigs that belong to different species present in the metagenomic sample. Binning of contigs plays an important role in metagenomics and most available binning algorithms bin contigs using genomic features such as oligonucleotide/k-mer composition and contig coverage. As metagenomic contigs are derived from the assembly process, they are output from the underlying assembly graph which contains valuable connectivity information between contigs that can be used for binning. We propose GraphBin, a new binning method that makes use of the assembly graph and applies a label propagation algorithm to refine the binning result of existing tools. We show that GraphBin can make use of the assembly graphs constructed from both the de Bruijn graph and the overlap-layout-consensus approach. Moreover, we demonstrate improved experimental results from GraphBin in terms of identifying mis-binned contigs and binning of contigs discarded by existing binning tools. To the best of our knowledge, this is the first time that the information from the assembly graph has been used in a tool for the binning of metagenomic contigs. The source code of GraphBin is available at https://github.com/Vini2/GraphBin.vijini.mallawaarachchi@anu.edu.au or yu.lin@anu.edu.auSupplementary data are available at Bioinformatics online.}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa180},
    url = {https://doi.org/10.1093/bioinformatics/btaa180},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/36/11/3307/33329097/btaa180.pdf},
}

Funding

GraphBin is funded by an Essential Open Source Software for Science Grant from the Chan Zuckerberg Initiative.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphbin-1.7.4.tar.gz (853.2 kB view details)

Uploaded Source

Built Distribution

graphbin-1.7.4-py3-none-any.whl (83.7 kB view details)

Uploaded Python 3

File details

Details for the file graphbin-1.7.4.tar.gz.

File metadata

  • Download URL: graphbin-1.7.4.tar.gz
  • Upload date:
  • Size: 853.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for graphbin-1.7.4.tar.gz
Algorithm Hash digest
SHA256 63849dbccbb51d6409c80433bee7c240ee657065245ddcd7d1edaa47311e7a3a
MD5 077d4a5c21042513368a5bc0ab9fa6a0
BLAKE2b-256 c7497e61f93aa6fe2888ad9d9051a797506bc9e0464ed60160500b9ecef9ffd1

See more details on using hashes here.

File details

Details for the file graphbin-1.7.4-py3-none-any.whl.

File metadata

  • Download URL: graphbin-1.7.4-py3-none-any.whl
  • Upload date:
  • Size: 83.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for graphbin-1.7.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f1eef0c662253c133a61e8757d5a356a15e9edf6ae4ce6ceb285dd869cbe28a3
MD5 eff7bb11919efdf64cd91e163702c797
BLAKE2b-256 992b4f5a3b12fcd20587cf0cc9d5a806bc42d069f96ea1dc4c30521e7a94604f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page