Skip to main content

Metagenomic binning with semi-supervised siamese neural network

Project description

S³N²Bin (Semi-supervised Siamese Neural Network for metagenomic binning)

Test Status Documentation Status License: MIT

NOTE: This tool is still in development. You are welcome to try it out and feedback is appreciated, but expect some bugs/rapid changes until it stabilizes. Please use Github issues for bug reports and the Discussions for more open-ended discussions/questions.

Command tool for metagenomic binning with semi-supervised deep learning using information from reference genomes.

Install

S3N2Bin runs on Python 3.6-3.8.

Install from source

You can download the source code from github and install.

Install dependence packages using conda: Bedtools, Hmmer, Fraggenescan and cmake.

conda install -c bioconda bedtools hmmer fraggenescan
conda install -c anaconda cmake=3.19.6
python setup.py install

Examples

Easy single/co-assembly binning mode

You will need the following inputs:

  1. A contig file (contig.fna in the example below)
  2. BAM files from mapping

You can get the results with one line of code. The single_easy_bin command can be used in single-sample and co-assembly binning modes (contig annotations using mmseqs with GTDB reference genome). single_easy_bin includes the following steps: predict_taxonomy,generate_data_single and bin.

S3N2Bin single_easy_bin -i contig.fna -b *.bam -o output

In this example, S³N²Bin will download GTDB to $HOME/.cache/S3N2Bin/mmseqs2-GTDB/GTDB. You can change this default using the -r argument.

Easy multi-samples binning mode

The multi_easy_bin command can be used in multi-samples binning modes (contig annotations using mmseqs with GTDB reference genome). multi_easy_bin includes following step: predict_taxonomy, generate_data_multi and bin.

You will need the following inputs.

  1. A combined contig file

  2. BAM files from mapping

For every contig, format of the name is <sample_name>:<contig_name>, where : is the default separator (it can be changed with the --separator argument). Note: Make sure the sample names are unique and the separator does not introduce confusion when splitting. For example:

>S1:Contig_1
AGATAATAAAGATAATAATA
>S1:Contig_2
CGAATTTATCTCAAGAACAAGAAAA
>S1:Contig_3
AAAAAGAGAAAATTCAGAATTAGCCAATAAAATA
>S2:Contig_1
AATGATATAATACTTAATA
>S2:Contig_2
AAAATATTAAAGAAATAATGAAAGAAA
>S3:Contig_1
ATAAAGACGATAAAATAATAAAAGCCAAATCCGACAAAGAAAGAACGG
>S3:Contig_2
AATATTTTAGAGAAAGACATAAACAATAAGAAAAGTATT
>S3:Contig_3
CAAATACGAATGATTCTTTATTAGATTATCTTAATAAGAATATC

You can get the results with one line of code.

S3N2Bin multi_easy_bin -i contig_whole.fna -b *.bam -o output

Advanced-bin mode

You can run individual steps by yourself, which can enable using compute clusters to make the binning process faster (especially in multi-samples binning mode).

For more details on usage, including information on how to run individual steps separately, read the docs.

Output

The output folder will contain

  1. Datasets used for training and clustering.

  2. Saved semi-supervised deep learning model.

  3. Output bins.

  4. Some intermediate files.

For every sample, reconstructed bins are in output_recluster_bins directory.

For more details about the output, read the docs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for S3N2Bin, version 0.1.1
Filename, size File type Python version Upload date Hashes
Filename, size S3N2Bin-0.1.1.tar.gz (2.9 MB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page