Skip to main content

GraphBin: Refined binning of metagenomic contigs using assembly graphs.

Project description

GraphBin logo

GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs

DOI install with bioconda Conda Conda PyPI version Downloads CI codecov Code style: black CodeQL Documentation Status

GraphBin is an NGS data-based metagenomic contig bin refinement tool that makes use of the contig connectivity information from the assembly graph to bin contigs. It utilizes the binning result of an existing binning tool and a label propagation algorithm to correct mis-binned contigs and predict the labels of contigs which are discarded due to short length.

For detailed instructions on installation, usage and visualisation, please refer to the documentation hosted at Read the Docs.

Dependencies

GraphBin installation requires python 3 to run. The following dependencies are required to run GraphBin and related support scripts.

Installing GraphBin

Using Conda

You can install GraphBin using the bioconda distribution. You can download Anaconda or Miniconda which contains conda.

# add channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# create conda environment
conda create -n graphbin

# activate conda environment
conda activate graphbin

# install graphbin
conda install -c bioconda graphbin

# check graphbin installation
graphbin -h

Using pip

You can install GraphBin using pip from the PyPI distribution.

pip install graphbin

For development purposes, please clone the repository and install via flit.

# clone repository to your local machine
git clone https://github.com/metagentools/GraphBin.git

# go to repo directory
cd GraphBin

# install flit
pip install flit

# install graphbin via flit
flit install -s --python `which python`

Example Usage

# SPAdes version
graphbin --assembler spades --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fasta --paths /path/to/paths_file.paths --binned /path/to/binning_result.csv --output /path/to/output_folder

# SGA version
graphbin --assembler sga --graph /path/to/graph_file.asqg --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

# MEGAHIT version
graphbin --assembler megahit --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

Visualization of the Assembly Graph of ESC+metaSPAdes Test Dataset

Initial Assembly Graph

Initial assembly graph

TAXAassign Labelling

TAXAassign Labelling

Original MaxBin Labelling with 2 Mis-binned Contigs

MaxBin Labelling

Refined Labels

Refined Labels

Final Labelling of GraphBin

Final Labelling

Citation

If you use GraphBin in your work, please cite GraphBin as,

Vijini Mallawaarachchi, Anuradha Wickramarachchi, Yu Lin. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3307–3313, DOI: https://doi.org/10.1093/bioinformatics/btaa180

@article{10.1093/bioinformatics/btaa180,
    author = {Mallawaarachchi, Vijini and Wickramarachchi, Anuradha and Lin, Yu},
    title = "{GraphBin: refined binning of metagenomic contigs using assembly graphs}",
    journal = {Bioinformatics},
    volume = {36},
    number = {11},
    pages = {3307-3313},
    year = {2020},
    month = {03},
    abstract = "{The field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. One key step in metagenomics analysis is to assemble reads into longer contigs which are then binned into groups of contigs that belong to different species present in the metagenomic sample. Binning of contigs plays an important role in metagenomics and most available binning algorithms bin contigs using genomic features such as oligonucleotide/k-mer composition and contig coverage. As metagenomic contigs are derived from the assembly process, they are output from the underlying assembly graph which contains valuable connectivity information between contigs that can be used for binning. We propose GraphBin, a new binning method that makes use of the assembly graph and applies a label propagation algorithm to refine the binning result of existing tools. We show that GraphBin can make use of the assembly graphs constructed from both the de Bruijn graph and the overlap-layout-consensus approach. Moreover, we demonstrate improved experimental results from GraphBin in terms of identifying mis-binned contigs and binning of contigs discarded by existing binning tools. To the best of our knowledge, this is the first time that the information from the assembly graph has been used in a tool for the binning of metagenomic contigs. The source code of GraphBin is available at https://github.com/Vini2/GraphBin.vijini.mallawaarachchi@anu.edu.au or yu.lin@anu.edu.auSupplementary data are available at Bioinformatics online.}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa180},
    url = {https://doi.org/10.1093/bioinformatics/btaa180},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/36/11/3307/33329097/btaa180.pdf},
}

Funding

GraphBin is funded by an Essential Open Source Software for Science Grant from the Chan Zuckerberg Initiative.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphbin-1.7.3.tar.gz (844.9 kB view details)

Uploaded Source

Built Distribution

graphbin-1.7.3-py3-none-any.whl (54.3 kB view details)

Uploaded Python 3

File details

Details for the file graphbin-1.7.3.tar.gz.

File metadata

  • Download URL: graphbin-1.7.3.tar.gz
  • Upload date:
  • Size: 844.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for graphbin-1.7.3.tar.gz
Algorithm Hash digest
SHA256 9b76008de0c18d1f83151dbb677f382875c64628ba132615a4b4a9314bc6a1ed
MD5 e6f2af7bc04999bf5b1b1fa912d49ad0
BLAKE2b-256 b7d298561314455851095394918063dec12c955521d993bf52995a142c2157ca

See more details on using hashes here.

File details

Details for the file graphbin-1.7.3-py3-none-any.whl.

File metadata

  • Download URL: graphbin-1.7.3-py3-none-any.whl
  • Upload date:
  • Size: 54.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for graphbin-1.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fc0af0b2fc1740d631a8704893e9e3c747f2908053f181c68addf5e2c0df3910
MD5 35a011a08b58dc5e5300ae17f8faa883
BLAKE2b-256 e5667aa1022321573319602cf10f96a5b63823f6ad5d6331df2d9196c2bd2167

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page