Metagenomic binning with semi-supervised siamese neural network
Project description
S³N²Bin (Semi-supervised Siamese Neural Network for metagenomic binning)
NOTE: This tool is still in development. You are welcome to try it out and feedback is appreciated, but expect some bugs/rapid changes until it stabilizes. Please use Github issues for bug reports and the Discussions for more open-ended discussions/questions.
Command tool for metagenomic binning with semi-supervised deep learning using information from reference genomes.
Install
S3N2Bin runs on Python 3.6-3.8.
Install from source
You can download the source code from github and install.
Install dependence packages using conda: Bedtools, Hmmer, Fraggenescan and cmake.
conda install -c bioconda bedtools hmmer fraggenescan
conda install -c anaconda cmake=3.19.6
python setup.py install
Examples
Easy single/co-assembly binning mode
You will need the following inputs:
- A contig file (
contig.fna
in the example below) - BAM files from mapping
You can get the results with one line of code. The single_easy_bin
command can be used in
single-sample and co-assembly binning modes (contig annotations using mmseqs
with GTDB reference genome). single_easy_bin
includes the following steps:
predict_taxonomy
,generate_data_single
and bin
.
S3N2Bin single_easy_bin -i contig.fna -b *.bam -o output
In this example, S³N²Bin will download GTDB to
$HOME/.cache/S3N2Bin/mmseqs2-GTDB/GTDB
. You can change this default using the
-r
argument.
Easy multi-samples binning mode
The multi_easy_bin
command can be used in
multi-samples binning modes (contig annotations using mmseqs
with GTDB reference genome). multi_easy_bin
includes following step:
predict_taxonomy
, generate_data_multi
and bin
.
You will need the following inputs.
-
A combined contig file
-
BAM files from mapping
For every contig, format of the name is <sample_name>:<contig_name>
, where
:
is the default separator (it can be changed with the --separator
argument). Note: Make sure the sample names are unique and the separator
does not introduce confusion when splitting. For example:
>S1:Contig_1
AGATAATAAAGATAATAATA
>S1:Contig_2
CGAATTTATCTCAAGAACAAGAAAA
>S1:Contig_3
AAAAAGAGAAAATTCAGAATTAGCCAATAAAATA
>S2:Contig_1
AATGATATAATACTTAATA
>S2:Contig_2
AAAATATTAAAGAAATAATGAAAGAAA
>S3:Contig_1
ATAAAGACGATAAAATAATAAAAGCCAAATCCGACAAAGAAAGAACGG
>S3:Contig_2
AATATTTTAGAGAAAGACATAAACAATAAGAAAAGTATT
>S3:Contig_3
CAAATACGAATGATTCTTTATTAGATTATCTTAATAAGAATATC
You can get the results with one line of code.
S3N2Bin multi_easy_bin -i contig_whole.fna -b *.bam -o output
Advanced-bin mode
You can run individual steps by yourself, which can enable using compute clusters to make the binning process faster (especially in multi-samples binning mode).
For more details on usage, including information on how to run individual steps separately, read the docs.
Output
The output folder will contain
-
Datasets used for training and clustering.
-
Saved semi-supervised deep learning model.
-
Output bins.
-
Some intermediate files.
For every sample, reconstructed bins are in output_recluster_bins
directory.
For more details about the output, read the docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.