Skip to main content

a python package for super-fast and accurate annotation of molecular functionality using read data without prior assembly or gene finding

Project description


microbiome - functional annotation of sequencing reads

A super-fast ( < 20min/10GB of reads ) and accurate ( > 90% precision ) method for annotation of molecular functionality encoded in sequencing read data without the need for assembly or gene finding.

Web Service:

Docker: A pre-build docker image is available at


mi-faser runs on LINUX, MacOSX and WINDOWS systems.


Note: mi-faser was developed and optimized using DIAMOND v0.8.8, which is included in all release up to v1.11.4. This is also the version used in the accompanying publication [1]. All newer releases of mi-faser use the latest stable release of DIAMOND. mi-faser results for the first release (v1.2) with an updated version of DIAMOND (v0.9.13) were not affected by this (<0.1% difference; based on results for the artificial metagenome supplied as example dataset). According to the authors, more recent versions of DIAMOND offer substantial improvements regarding speed and memory usage as well as bugfixes. Thus, we strongly recommend to always use the latest version of DIAMOND (see Section: DIAMOND upgrade). This might alter mi-faser results slightly. However, results are expected to be enriched by new correct annotations rather than introducing mis-annotations.

Note that it is recommended to download and compile DIAMOND locally ( as this might have a significant impact on performance (due to special CPU instructions). However, this repository includes a pre-compiled version of DIAMOND to use.

Note that different split sizes could, at very rare occasions, result in minor deviations in mi-faser annotations. This is due to certain heuristics applied by DIAMOND when generating sequence alignments. We suggest to retain the split size for comparable analyses.

Optional extensions

  • SRA Toolkit >= 2.9.1 (NCBI)

    If installed enables mi-faser to automatically retrieve and process read files deposited in the NCBI Sequence Read Archives SRA. Currently SRR, ERR and DRR identifiers are suppotted.

Reference Database

mi-faser was developed using a manually curated reference database of protein functions (GS database; DOI 10.5281/zenodo.1048269).

Since version 1.5 mi-faser also contains a new GS+ database, which extends the default GS database. The GS+ database includes additional 55 manually curated protein sequences, introducing 28 new E.C.s that represent important microbial functions in the environment.

To create an new reference database, refer to the paragraph Creating a reference database.


Standalone VS Web Service

The Standalone version of mi-faser partitions the user input into subsets analogue to the Web Service ( However, those partitions are processed sequentially and not in parallel as in the Web Service. Thus the Standalone Version is only recommended for smaller jobs and is mainly thought to provide the mi-faser code base.

Python package mi-faser is available as python package. To install mi-faser using pip run:

pip install mifaser

mi-faser can the be used directly from the command line:


The mi-faser module can be imported in a Python project by import mifaser.


The pre-build mi-faser docker image is probably the most convenient way to run mi-faser locally or in any cloud infrastructure. The docker image can be used in the same way as the standalone version, however mounting of a common working directory into the virtual environment is required.

To create and execute a single instance of mi-faser using a locally mounted working directory run:

docker run --rm \
    -v <LOCAL_INPUT_DIRECTORY>:/input \
    -v <LOCAL_OUTPUT_DIRECTORY>:/output \
    bromberglab/mifaser -f <INPUT_FILE>

<INPUT_FILE> is a valid mi-faser input file located in <LOCAL_INPUT_DIRECTORY> on your host environment. By default, mi-faser reads inputfiles relative to /input and writes any output to /output. Thus, by bind mounting your local <LOCAL_INPUT_DIRECTORY> to /input inside the docker container, input files can be passed simply as relative paths to your <LOCAL_INPUT_DIRECTORY>. Similarly, by mounting a <LOCAL_OUTPUT_DIRECTORY> to /output inside the docker container, all mi-faser outputs can be accessed at the <LOCAL_OUTPUT_DIRECTORY>.

Python source (git repository)

Open a terminal and checkout the mi-faser repository:

git clone

or download the zipped version:

curl --remote-name


In case mi-faser was downloaded using the git repository:

  • navigate to the mi-faser repository base directory
  • all examples in the following documentation have to be run using python -m mifaser instead of mifaser.

run mi-faser (Single or 2-Lane mode)

Single: input-file containing DNA reads, single http[s]/ftp[s] url or SRA accession ID (sra:<accession_id>):

mifaser -f/--inputfile <INPUT_FILE>

2-Lane: two files (R1/R2), http[s]/ftp[s] urls or SRA accession IDs (sra:<accession_id1> sra:<accession_id2>):

mifaser -l/--lanes <R1_FILE> <R2_FILE>


mi-faser help:

usage: mifaser [-h] [-f INPUTFILE] [-l R1 R2] [-o OUTPUTFOLDER]
               [-d DATABASEFOLDER] [-i DIAMONDFOLDER] [-m] [-s SPLIT]
               [-S [SPLITMB]] [-t THREADS] [-c CPU] [-p] [-n] [-u UPDATE]
               [-D [arg [arg ...]]] [-v] [-q] [--version]

mi-faser, microbiome - functional annotation of sequencing reads

a super-fast ( < 10min/10GB of reads ) and accurate ( > 90% precision ) method
for annotation of molecular functionality encoded in sequencing read data
without the need for assembly or gene finding.

Public web service:

Version: 1.60 [03/23/20]

optional arguments:
  -h, --help            show this help message and exit
  -f INPUTFILE, --inputfile INPUTFILE
                        input DNA reads file, http[s]/ftp[s] url or SRA
                        accession id (sra:<id>)
  -l R1 R2, --lanes R1 R2
                        2-Lane format (R1/R2) files, http[s]/ftp[s] url or SRA
                        accession ids (sra:<id_1> sra:<id_2>)
                        path to base output folder; default: INPUTFILE_out
                        name of database located in database/ directory OR
                        absolute path to folder containing database files
                        path to folder containing diamond binary
  -m, --mapping         if flag is set all reads mappings will be generated
                        (reads{n=*} -> EC{n=1}, fasta)
  -s SPLIT, --split SPLIT
                        split by X sequences; default: 100k; 0 forces no split
  -S [SPLITMB], --splitmb [SPLITMB]
                        split by X MB; default: 25; (requires split from GNU
  -t THREADS, --threads THREADS
                        number of threads; default: 1
  -c CPU, --cpu CPU     max cpus per thread; default: all available
  -p, --preserve        if flag is set intermediate results are kept
  -n, --no-check        if flag is set check for compatibility between diamond
                        database and binary is omitted
  -u UPDATE, --update UPDATE
                        valid update commands: { diamond[:version] }
  -D [arg [arg ...]], --createdb [arg [arg ...]]
                        create new reference database: <db_name>
                        <db_sequences.fasta> [merge_db=<name of db to merge
                        with>] [update_ec_annotations=<1|0>; default: 0]
  -v, --verbose         set verbosity level; default: log level INFO
  -q, --quiet           if flag is set console output is logged to file
  --version             show program's version number and exit

If you use *mi-faser* in published research, please cite:

Zhu, C., Miller, M., ... Bromberg, Y. (2017).
Functional sequencing read annotation for high precision microbiome analysis.
Nucleic Acids Res. [doi:10.1093/nar/gkx1209]

mi-faser is developed by Chengsheng Zhu and Maximilian Miller.
Feel free to contact us for support at

This project is licensed under [NPOSL-3.0](

Test: mifaser -f mifaser/files/test/artificial_mg.fasta -o mifaser/files/test/out


A demo dataset containing 10k reads is provided to verify a local mi-faser installation. Navigate to the mifaser repository base directory and run mi-faser with the following arguments:

mifaser -f mifaser/files/test/artificial_mg.fasta -o mifaser/files/test/out

The resulting analysis will be located relative to the mifaser base directory at: mifaser/files/test/out/.

DIAMOND upgrade

As DIAMOND ( is actively developed, we provide an easy way to upgrade (or downgrade) to another version. In case a specific version of DIAMOND is given as parameter, this version will be automatically downloaded and installed (default: latest release).

mifaser --update diamond[:<DIAMOND_VERSION>]

Creating a reference database

mi-faser uses a manually curated reference database of protein functions. To create an alternative reference database, first store the desired set of protein sequences in a multi-FASTA file using the following format for the sequence headers:




Then run mi-faser using the -D/--createdb argument to create a new reference database my_database:

mifaser -D my_database path/to/sequences.fasta

To use the new database run:

mifaser -d my_database -f mifaser/files/test/artificial_mg.fasta -o mifaser/files/test/out

See the help menu (--help) for more details.


This project is licensed under NPOSL-3.0.


If you use mi-faser in published research, please cite:

Zhu, C., Miller, M., Marpaka, S., Vaysberg, P., Rühlemann, M. C., Wu, G. H. F.-A., . . . Bromberg, Y. (2017). Functional sequencing read annotation for high precision microbiome analysis. Nucleic Acids Res. doi:10.1093/nar/gkx1209


mi-faser is developed by Chengsheng Zhu and Maximilian Miller. Feel free to contact us for support:

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mifaser-1.60.tar.gz (4.7 MB view hashes)

Uploaded source

Built Distribution

mifaser-1.60-py3-none-any.whl (4.7 MB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page