Metagenomic binning with semi-supervised siamese neural network
Project description
S³N²Bin (Semi-supervised Siamese Neural Network for metagenomic binning)
NOTE: This tool is still in development. You are welcome to try it out and feedback is appreciated, but expect some bugs/rapid changes until it stabilizes. Please use Github issues for bug reports and the Discussions for more open-ended discussions/questions.
Command tool for metagenomic binning with semi-supervised deep learning using information from reference genomes.
Install
S3N2Bin runs on Python 3.6-3.8.
Install from source
You can download the source code from github and install.
Install dependence packages using conda: Bedtools, Hmmer, Fraggenescan and cmake.
conda install -c bioconda bedtools hmmer fraggenescan
conda install -c anaconda cmake=3.19.6
python setup.py install
Examples
Easy single/co-assembly binning mode
You will need the following inputs:
- A contig file (
contig.fna
in the example below) - BAM files from mapping
You can get the results with one line of code. The single_easy_bin
command can be used in
single-sample and co-assembly binning modes (contig annotations using mmseqs
with GTDB reference genome). single_easy_bin
includes the following steps:
predict_taxonomy
,generate_data_single
and bin
.
S3N2Bin single_easy_bin -i contig.fna -b *.bam -o output
In this example, S³N²Bin will download GTDB to
$HOME/.cache/S3N2Bin/mmseqs2-GTDB/GTDB
. You can change this default using the
-r
argument.
Easy multi-samples binning mode
The multi_easy_bin
command can be used in
multi-samples binning modes (contig annotations using mmseqs
with GTDB reference genome). multi_easy_bin
includes following step:
predict_taxonomy
, generate_data_multi
and bin
.
You will need the following inputs.
-
A combined contig file
-
BAM files from mapping
For every contig, format of the name is <sample_name>:<contig_name>
, where
:
is the default separator (it can be changed with the --separator
argument). Note: Make sure the sample names are unique and the separator
does not introduce confusion when splitting. For example:
>S1:Contig_1
AGATAATAAAGATAATAATA
>S1:Contig_2
CGAATTTATCTCAAGAACAAGAAAA
>S1:Contig_3
AAAAAGAGAAAATTCAGAATTAGCCAATAAAATA
>S2:Contig_1
AATGATATAATACTTAATA
>S2:Contig_2
AAAATATTAAAGAAATAATGAAAGAAA
>S3:Contig_1
ATAAAGACGATAAAATAATAAAAGCCAAATCCGACAAAGAAAGAACGG
>S3:Contig_2
AATATTTTAGAGAAAGACATAAACAATAAGAAAAGTATT
>S3:Contig_3
CAAATACGAATGATTCTTTATTAGATTATCTTAATAAGAATATC
You can get the results with one line of code.
S3N2Bin multi_easy_bin -i contig_whole.fna -b *.bam -o output
Advanced-bin mode
You can run individual steps by yourself, which can enable using compute clusters to make the binning process faster (especially in multi-samples binning mode).
For more details on usage, including information on how to run individual steps separately, read the docs.
Output
The output folder will contain
-
Datasets used for training and clustering.
-
Saved semi-supervised deep learning model.
-
Output bins.
-
Some intermediate files.
For every sample, reconstructed bins are in output_recluster_bins
directory.
For more details about the output, read the docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file S3N2Bin-0.1.1.tar.gz
.
File metadata
- Download URL: S3N2Bin-0.1.1.tar.gz
- Upload date:
- Size: 2.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b4490dd3f68dea92aba74857123d3320b03ff41ab97c5677bfe99286bd72027 |
|
MD5 | 11772d917bd27e3e489257ff96a37dc7 |
|
BLAKE2b-256 | 46d6456e958d820cf1a0472ec951341b9056f73e9633b046a4de39a2e2f10084 |