Skip to main content

Binette: accurate binning refinement tool to constructs high quality MAGs.

Project description

install with bioconda Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge PyPI version status

Test Coverage CI Status Documentation Status

Binette

Binette is a fast and accurate binning refinement tool designed to construct high-quality MAGs from the output of multiple binning tools.

How It Works

From the input bin sets, Binette constructs new hybrid bins. A bin is a set of contigs. When at least two bins overlap (i.e., share at least one contig), Binette applies fundamental set operations to generate new bins:

  • Intersection Bin: Contains contigs that are shared by overlapping bins.
  • Difference Bin: Includes contigs that are unique to one bin and not found in others.
  • Union Bin: Encompasses all contigs from the overlapping bins.

Binette then evaluates bin quality using CheckM2, ensuring the selection of the best possible bins.

Why Binette?

Binette is inspired by the metaWRAP bin-refinement tool but effectively addresses its limitations. Key improvements include:

  • Enhanced Speed
    Binette significantly accelerates the refinement process by running CheckM2's initial steps (e.g., Prodigal and Diamond) only once for all contigs. These intermediate results are then reused for bin quality assessment, eliminating redundant computations.

  • No Limit on Input Bin Sets
    Unlike metaWRAP, Binette supports any number of input bin sets, allowing seamless processing of multiple binning outputs.

[!NOTE] For a detailed guide and tutorial, see the Binette documentation.

Installation

With Bioconda

Binette can be easily installed with conda

conda create -c bioconda -c defaults -c conda-forge -n binette binette
conda activate binette

Binette should be able to run :

binette -h

From a conda environment

Clone this repository:

git clone https://github.com/genotoul-bioinfo/Binette
cd Binette

Then create a Conda environment using the binette.yaml file:

conda env create -n binette -f binette.yaml
conda activate binette 

Finally install Binette with pip

pip install .

Binette should be able to run :

binette -h

Downloading the CheckM2 database

Before using Binette, it is necessary to download the CheckM2 database:

checkm2 database --download --path <checkm2/database/>

Make sure to replace <checkm2/database/> with the desired path where you want to store the CheckM2 database.

Usage

Input Formats

Binette supports two input formats for bin sets:

  1. Contig2bin Tables: You can provide bin sets using contig2bin tables, which establish the relationship between each contig and its corresponding bin. In this format, you need to specify the --contig2bin_tables argument.

For example, consider the following two contig2bin_tables:

  • bin_set1.tsv:

    contig_1   binA
    contig_8   binA
    contig_15  binB
    contig_9   binC
    
  • bin_set2.tsv:

    contig_1   bin.0
    contig_8   bin.0
    contig_15  bin.1
    contig_9   bin.2
    contig_10  bin.0
    

    The binette command to process this input would be:

    binette --contig2bin_tables bin_set1.tsv bin_set2.tsv --contigs assembly.fasta
    
  1. Bin Directories: Alternatively, you can use bin directories, where each bin is represented by a separate FASTA file. For this format, you need to provide the --bin_dirs argument. Here's an example of two bin directories:

    bin_set1/
    ├── binA.fa: contains sequences of contig_1, contig_8
    ├── binB.fa: contains sequences of contig_15
    └── binC.fa: contains sequences of contig_9
    
    bin_set2/
    ├── binA.fa: contains sequences of contig_1, contig_8, contig_10
    ├── binB.fa: contains sequences of contig_15
    └── binC.fa: contains sequences of contig_9
    

    The binette command to process this input would be:

    binette --bin_dirs bin_set1 bin_set2 --contigs assembly.fasta
    

In both formats, the --contigs argument should specify a FASTA file containing all the contigs found in the bins. Typically, this file would be the assembly FASTA file used to generate the bins. In these examples the assembly.fasta file should contain at least the five contigs mentioned in the contig2bin_tables files or in the bin fasta files: contig_1, contig_8, contig_15, contig_9, and contig_10.

Outputs

Binette results are stored in the results directory. You can specify a different directory using the --outdir option.

In this directory you will find:

  • final_bins_quality_reports.tsv: This is a TSV (tab-separated values) file containing quality information about the final selected bins.
  • final_bins/: This directory stores all the selected bins in fasta format. Can be skipped with --no-write-fasta-bins.
  • final_contig_to_bin.tsv: A headerless TSV file mapping each contig to its assigned bin. This format is much lighter than the fasta output to describe the final Binette bins.
  • input_bins_quality_reports/: A directory storing quality reports for the input bin sets, with files following the same structure as final_bins_quality_reports.tsv.
  • temporary_files/: This directory contains intermediate files. If you choose to use the --resume option, Binette will utilize files in this directory to prevent the recomputation of time-consuming steps.

The final_bins_quality_reports.tsv file contains the following columns:

Column Name Description
name The unique name of the bin.
origin Indicates the source of the bin: either an original bin set (e.g., B) or binette for intermediate bins.
is_original Boolean flag indicating if the bin is an original bin (True) or an intermediate bin (False).
original_name The name of the original bin from which this bin was derived.
completeness The completeness of the bin, determined by CheckM2.
contamination The contamination of the bin, determined by CheckM2.
checkm2_model The CheckM2 model used for quality prediction: Gradient Boost (General Model) or Neural Network (Specific Model).
score Computed score: completeness - contamination * weight. The contamination weight can be customized using the --contamination_weight option.
size Total size of the bin in nucleotides.
N50 The N50 of the bin, representing the length for which 50% of the total nucleotides are in contigs of that length or longer.
coding_density The percentage of the bin that codes for proteins (genes length / total bin length × 100). Only computed when genes are freshly identified. Empty when using --proteins or --resume options.
contig_count Number of contigs contained within the bin.

Help, feature requests and bug reporting

To report bugs, request new features, or seek help and support, please open an issue.

Licence

This tool is released as open source software under the terms of the GNU General Public Licence.

Citation

Binette is a scientific software tool with a published paper in the Journal of Open Source Software. If you use Binette in academic research, please cite:

Binette: a fast and accurate bin refinement tool to construct high-quality Metagenome Assembled Genomes.
Mainguy et al., (2024).
Journal of Open Source Software, 9(102), 6782.
doi: 10.21105/joss.06782

Binette extensively uses CheckM2. If your work relies on Binette, consider citing:

CheckM2: a rapid, scalable, and accurate tool for assessing microbial genome quality using machine learning.
Chklovski, Alex, et al. (2023).
Nature Methods, 20(8), 1203-1212.
doi: 10.1038/s41592-023-01940-w

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binette-1.2.1.tar.gz (123.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

binette-1.2.1-py3-none-any.whl (46.3 kB view details)

Uploaded Python 3

File details

Details for the file binette-1.2.1.tar.gz.

File metadata

  • Download URL: binette-1.2.1.tar.gz
  • Upload date:
  • Size: 123.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for binette-1.2.1.tar.gz
Algorithm Hash digest
SHA256 26acda9889a95497581503eee357c976477a0dcdc2174679fdd94bf68c8917b4
MD5 86a16e9565a6618c56e1bf1c91a016fc
BLAKE2b-256 f2031261165ae5abc5bcb8779b0059d6dac4de4868043d71a9dd47bc15f919bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for binette-1.2.1.tar.gz:

Publisher: release.yml on genotoul-bioinfo/Binette

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binette-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: binette-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 46.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for binette-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a0446fc4fe34fcbb4dabeed1ba744e766381d0eef5a2fe95b44a54285775dbfd
MD5 b1dc05d098125a7b727266f1c0643985
BLAKE2b-256 a950661c6dbba50a218eed9b30dbda8d194885bd10b31d8628a066138fc4c723

See more details on using hashes here.

Provenance

The following attestation bundles were made for binette-1.2.1-py3-none-any.whl:

Publisher: release.yml on genotoul-bioinfo/Binette

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page