Skip to main content

Taxonomic profiling of metagenomes from diverse environments with mOTUs4

Project description

alt text

license


mOTUs profiler

The mOTU profiler is a computational tool that estimates taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.

The current version of the mOTUs profiler is built on top of the genomic mOTUs database (motus-db) which is constructed from 919K isolate and single cell-amplified (SAGs) genomes and 2.83M metagenome-assembled genomes (MAGs) generated from over 117K metagenomic samples spanning diverse microbiomes, which include (in addition to the human and ocean microbiome) soil, freshwater and gastrointestinal tract microbiomes of ruminants and other animals, environments we found to be greatly underrepresented by reference genomes.

In the current version, 124,295 species-level taxonomic units (mOTUs) were constructed using sequences of 10 single-copy marker genes recovered from these genomes. 30,256 mOTUs are represented by an isolate genome, whereas 94,039 mOTUs are represented by MAGs only.

If you use the mOTUs profiler, please cite:

Reference genome-independent taxonomic profiling of microbiomes with mOTUs3

Hans-Joachim Ruscheweyh* , Alessio Milanese*, Lucas Paoli, Nicolai Karcher, Quentin Clayssen, Marisa Isabell Metzger, Jakob Wirbel, Peer Bork, Daniel R. Mende, Georg Zeller# & Shinichi Sunagawa#

Microbiome (2022)

doi: 10.1186/s40168-022-01410-z

If you use the mOTUs database, please cite:

The mOTUs online database provides web-accessible genomic context to taxonomic profiling of microbial communities

Marija Dmitrijeva* , Hans-Joachim Ruscheweyh* , Lilith Feer , Kang Li , Samuel Miravet-Verde , Anna Sintsova , Daniel R Mende , Georg Zeller , Shinichi Sunagawa#

Nucleic Acids Research (2025)

doi: https://doi.org/10.1093/nar/gkae1004


📦 Installation

The mOTUs profiler, written in Python 3 (>=3.12), can be executed on a 64-bit Linux or MacOS system. However, there are external dependencies that need to be pre-installed. These dependencies can be manually installed or, more conveniently, using the conda package manager.

Installation with Conda

Miniconda

The installation using the conda package manager is generally preferable, as it encapsulates the entire installation process into a single command once conda is installed. Execute the following command to install conda:

$ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh Miniconda3-latest-Linux-x86_64.sh
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge

If working on a MacOS system, the download link has to be replaced by: https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh.

Install using conda:

$ conda create -n mOTUs4 python==3.12 bwa=0.7.19 vsearch pip
$ conda activate mOTUs4
$ python -m pip install motus-tool

🚀 Usage

After installation, you can test whether the tool was installed correctly by executing:

$ motus --help

Note Currently the command to execute mOTUs is python motus/motus.py which will be replaced with motus once the tool is installed via pip.

Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Usage:
        motus <command> [options]


    Commands:

        -- Taxonomic profiling

            profile       Perform taxonomic profiling (map_tax + calc_mgc + calc_motu) in a single step

            map_tax       Map reads to the marker gene database
            calc_mgc      Calculate marker gene cluster (MGC) abundance
            calc_motu     Summarize MGC abundances into a mOTU profile


        -- Tool utilities

            downloadMGDB  Download the mOTUs marker gene database
            merge         Merge multiple taxonomic profiling results into one table
            classify      Classify user genomes into mOTUs
            prep_long     Prepare long reads to be profiled by mOTUs


        -- Genome accession

            genomes       Search the mOTUs-db by keyword (taxonomic, functional)
            download      Download sequence files from mOTUs-db


    Type motus <command> to print the help menu for a specific command

Commands

The profile function in mOTUs is the main function that executes map_tax, calc_mgc, and calc_motu in sequence. It takes short read metagenomic sequencing data as input and generates a taxonomic profile.

Helper functions include download, which provides users with programmatic access to the ~4 million genomes in the motus-db; downloadMGDB, which downloads the marker gene database of mOTUs; merge, which merges multiple taxonomic profiles; and classify, which assigns user-submitted genomes to existing mOTUs.


Profile

$ motus profile
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The profile command in mOTUs is the main function that executes map_tax, calc_mgc,
        and calc_motu in sequence. It takes short read metagenomic sequencing data as input
        and generates a taxonomic profile.


    Usage:
       motus profile -f FILE [FILE ...] -r FILE [FILE ...] -s FILE [FILE ...] -o FILE [options]
       motus profile -f FILE [FILE ...] -r FILE [FILE ...] -o FILE [options]
       motus profile -s FILE [FILE ...] -o FILE [options]


    Input options:
        -f, --forward  FILE [FILE ...]
            Input file(s) for reads in forward orientation, fastQ/A(.gz)-formatted

        -r, --reverse  FILE [FILE ...]
            Input file(s) for reads in reverse orientation, fastQ/A(.gz)-formatted

        -s, --single  FILE [FILE ...]
            Input file(s) for unpaired reads, fastQ/A(.gz)-formatted

        -n, --sample-name  STR
            Sample name (default: 'unnamed sample')

    Output options:
        -o, --output-file  FILE
            Output file name [required]

    Algorithm options:
        -g, --marker-genes  INT
            Required number of marker genes for a mOTU to be called present: 
            1=higher recall, 6=higher precision, 10=maximum (default: 3)

        -l, --alignment-length  INT
            Minimum length of the alignment (bp) (default: 75)

        -t, --threads  INT
            Number of threads (default: 1)

        -y, --counting-mode  STR
            Which scale the abundances are reported in (default: INSERT_SCALED)
            Choices: [INSERT_RAW, INSERT_NORM, INSERT_SCALED, BASE_RAW, BASE_NORM]

Map Tax

$ motus map_tax
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The map_tax command takes short read metagenomic sequencing data as input and
        maps reads to the mOTUs marker gene database.


    Usage:
        motus map_tax -f FILE [FILE ...] -r FILE [FILE ...] -s FILE [FILE ...] -o FILE [options]
        motus map_tax -f FILE [FILE ...] -r FILE [FILE ...] -o FILE [options]
        motus map_tax -s FILE [FILE ...] -o FILE [options]


    Input options:
        -f, --forward  FILE [FILE ...]
            Input file(s) for reads in forward orientation, fastQ/A(.gz)-formatted

        -r, --reverse  FILE [FILE ...]
            Input file(s) for reads in reverse orientation, fastQ/A(.gz)-formatted

        -s, --single  FILE [FILE ...]
            Input file(s) for unpaired reads, fastQ/A(.gz)-formatted

    Output options:
        -o, --output-file  FILE
            Output file name [required]

    Algorithm options:
        -l, --alignment-length  INT
            Minimum length of the alignment (bp) (default: 75)

        -t, --threads  INT
            Number of threads (default: 1)

Calc MGC

$ motus calc_mgc
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The calc_mgc command takes a file storing the alignments of sequencing reads
        to the mOTUs marker gene database and calculates marker gene cluster abundances.


    Usage:
        motus calc_mgc -i FILE -o FILE [options]


    Input options:
        -i, --input-file  FILE
            Path to BAM file generated after running the motus map_tax command [required]

    Output options:
        -o, --output-file  FILE
            Output file name [required]

    Algorithm options:
        -l, --alignment-length  INT
            Minimum length of the alignment (bp) (default: 75)

Calc mOTU

$ motus calc_motu
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The calc_motu command takes a file containing marker gene cluster
        abundances and generates a taxonomic profile.


    Usage:
        motus calc_motu -i FILE -o FILE [options]


    Input options:
        -i, --input-file  FILE
            MGC abundance table generated by the calc_mgc command [required]

        -n, --sample-name  STR
            Sample name (default: 'unnamed sample')

    Output options:
        -o, --output-file  FILE
            Output file name [required]

    Algorithm options:
        -g, --marker-genes  INT
            Required number of marker genes for a mOTU to be called present: 
            1=higher recall, 6=higher precision, 10=maximum (default: 3)

        -y, --counting-mode  STR
            Which scale the abundances are reported in (default: INSERT_SCALED)
            Choices: [INSERT_RAW, INSERT_NORM, INSERT_SCALED, BASE_RAW, BASE_NORM]

merge

$ motus merge
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The merge command takes multiple profiles produced after running the
        profile command and combines them into a single table.


    Usage:
        motus merge -i FILE [FILE ...] -o FILE


    Input options:
        -i, --input-files  FILE [FILE ...]
            A list of mOTUs profile files or a text file containing the list of profile
            files to be merged, with one line per file [required]

    Output options:
        -o, --output-file  FILE
            Output file name [required]

downloadMGDB

$ motus downloadMGDB
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The downloadMGDB command downloads the marker gene reference database used
        by the profile and map_tax commands.


    Usage:
        motus downloadMGDB [options]


    Options:
        -f, --force
            Force download even when database is already present

classify

$ motus classify
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The classify command takes a list of genome sequence files as input and
        assigns these genomes to existing mOTUs in the database.


    Usage:
        motus classify -i FILE -o FILE [options]


    Input options:
        -i, --input-file  FILE
            Text file listing genome sequence files in fastA(.gz) format to classify.
            One line per genome file [required]

    Output options:
        -o, --output-file  FILE
            Output file name. Each line contains a genome and its associated mOTU [required]

    Algorithm options:
        -t, --threads  INT
            Number of threads (default: 1)

prep_long

$ motus prep_long
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The prep_long command takes long-read sequencing data and converts it
        into the appropriate input format to be used by the profile and map_tax commands.


    Usage:
        motus prep_long -i FILE -o FILE [options]


    Input options:
        -i, --input-file  FILE
            Long-read sequencing file to convert, can be in fastQ/A(.gz) format [required]

    Output options:
        -o, --output-file  FILE
            Output file name. This converted file is ready to be used by motus profile [required]

    Algorithm options:
        -sl, --splitting-length  INT
            Target fragment length (in bp) for splitting long reads (default: 300)

        -ml, --minimum-length  INT
            Minimum read length after splitting. Shorter reads are discarded (default: 50)

           

download

$ motus download
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4
    
    
    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand 
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022). 
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible 
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025). 
        doi: https://doi.org/10.1093/nar/gkae1004
    

    Summary:
        The download command downloads listed genome files from mOTUs-db.


    Usage:
        motus download -i FILE -o PATH [options]
        motus download -i STR [STR ...] -o PATH [options]


    Input options:
        -i, --input-genomes  FILE/STR
            Can be either a list of genome identifiers separated by spaces or a text file
            listing the identifiers of genomes for download. One line per genome. The output of
            the motus genomes command can be used as input for this command [required]

    Output options:
        -o, --output-folder  PATH
            Path to output folder where the downloaded sequences will be saved [required]

        -r, --representatives
            Download only sequences from representative genomes.

genomes

$ motus genomes
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4

    
    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand 
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022). 
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible 
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025). 
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The genomes command queries the mOTUs-db based on identifiers, functional,
        or taxonomic annotations and returns a list of genomes matching indicated query.


    Usage:    
        motus genomes -i FILE -o FILE [options]
        motus genomes -i STR [STR ...] -o FILE [options]


    Input options:
        -i, --input-queries  FILE/STR
            Can be either a list of search queries or a text file listing search queries
            with one line per query. Queries can be genome or mOTUs identifiers, PFAM, KEGG, EGGNOG, 
            or GTDB taxonomy names. If the query does not exactly match any database entry,
            alternative queries will be suggested [required]

    Output options:
        -o, --output-file  FILE
            Output file containing a list of genome identifiers matching search queries and their 
            annotations as indicated by the -d parameter. This output file can be used as input
            for the motus download command [required]

        -d, --details  STR [STR ...]
            List of annotations to report. Choose any combination of [KEGG, PFAM, EGGNOG, TAXONOMY],
            for example, -d KEGG PFAM.
                            

❓ Need Help?

Write an issue on GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

motus_tool-4.0.4.tar.gz (73.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

motus_tool-4.0.4-py3-none-any.whl (79.3 kB view details)

Uploaded Python 3

File details

Details for the file motus_tool-4.0.4.tar.gz.

File metadata

  • Download URL: motus_tool-4.0.4.tar.gz
  • Upload date:
  • Size: 73.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for motus_tool-4.0.4.tar.gz
Algorithm Hash digest
SHA256 e42f671f0dd46ac5606bc8c68a93ffdd09bfb90e8eb12e4b22269c320d0adc52
MD5 c8d3dca4106101c9205ebfc92a358121
BLAKE2b-256 d368b89b68ff62a3b1892e2bf8f02341da84d886254268fe5350a78436bdfd84

See more details on using hashes here.

File details

Details for the file motus_tool-4.0.4-py3-none-any.whl.

File metadata

  • Download URL: motus_tool-4.0.4-py3-none-any.whl
  • Upload date:
  • Size: 79.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for motus_tool-4.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 757da40a2be77e98c1b3445a345d0ecae558b3ca2b70c1845f6847bb783c10ee
MD5 af668981bda3784ec4eeebe0cf8639ee
BLAKE2b-256 df08e55cfb43977794bd3449f235dda6ca77d4fda58ced50e1e9dbb03d8102f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page