Taxonomic profiling of metagenomes from diverse environments with mOTUs3
Project description
mOTU profiler
The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.
Check the wiki for more information.
If you are using mOTUs, please cite:
Reference genome-independent taxonomic profiling of microbiomes with mOTUs3
Hans-Joachim Ruscheweyh*, Alessio Milanese*, Lucas Paoli, Nicolai Karcher, Quentin Clayssen, Marisa Isabell Metzger, Jakob Wirbel, Peer Bork, Daniel R. Mende, Georg Zeller# & Shinichi Sunagawa#
Microbiome (2022)
Pre-requisites
The mOTU profiler requires:
- Python 3 (or higher)
- the Burrow-Wheeler Aligner v0.7.15 or higher (bwa)
- SAMtools v1.5 or higher (link)
In order to use the command snv_call
you need:
- metaSNV v1.0.3, available also on bioconda (we assume metaSNV.py to be in the system path)
Check installation wiki to see how to install the dependencies with conda.
Installation
mOTUs can be installed either by using pip
or via conda
.
Installation with conda
has the advantage that it will also download and install dependencies:
# Install in the base environment
conda install motus
# OR, create a new environment
conda create -n motu-env motus
conda activate motu-env
Installation with pip
:
# Download and install mOTUs
pip install motu-profiler
# Download the mOTUs database
motus downloadDB
You can test that motus is intalled correctly with:
motus profile --test
Basic examples
Here is a simple example on how to obtain a taxonomic profiling from a raw read file:
motus profile -s metagenomic_sample.fastq > taxonomy_profile.txt
You can separate the previous call as:
motus map_tax -s metagenomic_sample.fastq -o mapped_reads.sam
motus calc_mgc -i mapped_reads.sam -o mgc_ab_table.count
motus calc_motu -i mgc_ab_table.count > taxonomy_profile.txt
rm mapped_reads.sam mgc_ab_table.count
The use of multiple threads (-t
) is recommended, since bwa will finish faster. Here is an example with Paired-End reads:
motus profile -f for_sample.fastq -r rev_sample.fastq -s no_pair.fastq -t 6 > taxonomy_profile.txt
You can merge taxonomy files from different samples with mOTU merge
:
motus profile -s metagenomic_sample_1.fastq -o taxonomy_profile_1.txt
motus profile -s metagenomic_sample_2.fastq -o taxonomy_profile_2.txt
motus merge -i taxonomy_profile_1.txt,taxonomy_profile_2.txt > all_sample_profiles.txt
You can profile samples that have been sequenced through different runs:
motus profile -f sample1_run1_for.fastq,sample1_run2_for.fastq -r sample1_run1_rev.fastq,sample1_run2_rev.fastq -s sample1_run1_single.fastq > taxonomy_profile.txt
How mOTUs works
The mOTUs tool performs taxonomic profiling of metagenomics and metatrancriptomics samples, i.e. it identifies species and their relative abundance present in a sample. It is based on a set of mOTUs (~species) contained in the mOTUs database. The mOTUs database is created from reference genomes, metagenomic samples and metagenome assembled genomes (MAGs):
A mOTUs database is composed of three types of mOTUs:
- ref-mOTUs, which represent known species,
- meta-mOTUs, which represent unknown species obtained from metagenomic samples,
- ext-mOTUs, which represent unknown species obtained from MAGs.
Note that meta- and ext-mOTUs will not have a species level annotation.
The mOTUs database is updated periodically, e.g the latest version (3.0.3), which doubles the number of profilable species by including ~600,000 draft genomes. Major releases are represented in the following graph (where the numbers represents the number of mOTUs for each of the three groups, with the same color-code as the previous graph):
When profiling (motus profile
) a metagenomic sample, the mOTUs tool maps the reads from the sample to the genes in the different mOTUs:
ChangeLog
Version 3.1.0 2023-03-28 by AlessioMilanese
- Improve database clustering algorithm and update the database (change the number of ext-mOTUs from 19,358 to 20,128)
Version 3.0.3 2022-07-13 by AlessioMilanese
- Add command
prep_long
to allow the profiling of long reads (more information here)
Version 3.0.2 2022-01-31 by AlessioMilanese
- Convert the repository to a python package and submit to PyPI
Version 3.0.1 2021-07-27 by AlessioMilanese
- Improve ref-mOTUs taxonomy according to #76
- Solve bug with
-A
option
Version 3.0.0 2021-06-22 by AlessioMilanese
- Improve code base
- Minor bug fixes
Version 2.6.1 2021-04-27 by AlessioMilanese
- Minor bug fixes
- Improved the taxonomy of 32 ref-mOTUs (#45)
Version 2.6.0 2021-03-08 by AlessioMilanese
- Add 19,358 new mOTUs
- Add taxonomic profiles of > 11k metagenomic and metatranscriptomic samples. The updated merge function can integrate those in to the users results.
- Minor bug fixes
- Change
-1
tounassigned
Version 2.5.1 2019-08-17 by AlessioMilanese
- Update the taxonomy to participate to the CAMI 2 challenge
Version 2.5.0 2019-08-09 by AlessioMilanese
- Add -db option to use a database from another directory
- Add -A to print all taxonomy levels together
- Update the database with more than 60k new reference genomes. There are 11,915 ref-mOTUs and 2,297 meta-mOTUs.
Version 2.1.1 2019-03-04 by AlessioMilanese
- Correct problem with samtools when installing with conda
Version 2.1.0 2019-03-03 by AlessioMilanese
- Correct error '\t\t' when printing -C recall
- Update database (gene coordinates)
Version 2.0.1 2018-08-23 by AlessioMilanese
- Add -C to print the result in CAMI format (BioBoxes format 0.9.1)
- Add -K to snv_call command to keep all the directories produced by metaSNV
Version 2.0.0 2018-06-12 by AlessioMilanese
- Set relative abundances as default (instead of counts)
- Add -B to print the result in BIOM format
- Add test directory
- Python2 is not supported anymore
- Minor bug fixes
Version 2.0.0-rc1 2018-05-10 by AlessioMilanese
- First release supporting all basic functionality.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file motu-profiler-3.1.0.tar.gz
.
File metadata
- Download URL: motu-profiler-3.1.0.tar.gz
- Upload date:
- Size: 81.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38959ae1b1b9892b2b47bda49a6abc2389f49802bb36d2055dcaf15080cef3f3 |
|
MD5 | c27d5ca91b623d7bf8bffc7c44628ea0 |
|
BLAKE2b-256 | 822245a94f7adb2c226f013c4c00e881c3d6a9b3f7badf3f3c1d36fc8570fa1d |