Skip to main content

MUSICC: A marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome.

Project description

MUSiCC Documentation

MUSiCC is a marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome, developed and maintained by the Borenstein group at the University of Washington.

Availability

MUSiCC is available through the following sources:

License

MUSiCC is distributed under a BSD license and can be readily incorporated into custom analysis tools.

Installation Instructions

Prerequisites for installing:

In order for MUSiCC to run successfully, the following Python modules should be pre-installed on your system:

If you have pip installed, you can install these packages by running the following command:

pip install -U numpy scipy scikit-learn pandas

Installing MUSiCC:

To install MUSiCC, download the package from https://github.com/omanor/MUSiCC/archive/1.0.3.tar.gz

After downloading MUSiCC, you’ll need to unzip the file. If you’ve downloaded the release version, do this with the following command:

tar -xzf MUSiCC-1.0.3.tar.gz

You’ll then change into the new MUSiCC directory as follows:

cd MUSiCC-1.0.3

and install using the following command:

python setup.py install

ALTERNATIVELY, you can install MUSiCC directly from PyPI by running:

pip install -U MUSiCC

Note for windows users: Under some windows installations, Scipy may fail when importing the Stats module. Workarounds may be found online, such as here.

Testing the software package

After downloading and installing the software, we recommend testing it by running the following command:

test_musicc.py

This will invoke a series of tests. A correct output should end with:

Ran 3 tests in X.XXXXs

OK

MUSiCC API via the command line

The MUSiCC module handles all calculations internally. MUSiCC offers an interface to the MUSiCC functionality via the command line and the run_musicc script.

Usage:

run_musicc.py input_file [options]

Required arguments:

input_file

Input abundance file to correct

Optional arguments:

-h, –help

show help message and exit

-o OUTPUT_FILE, –out OUTPUT_FILE

Output destination for corrected abundance (default: MUSiCC.tab)

-if {tab,csv}, –input_format {tab,csv}

Option indicating the format of the input file (default: tab)

-of {tab,csv}, –output_format {tab,csv}

Option indicating the format of the output file (default: tab)

-n, –normalize

Apply MUSiCC normalization (default: false)

-c {use_generic, learn_model}, –correct {use_generic,learn_model}

Correct abundance per-sample using MUSiCC (default: false)

-perf, –performance

Calculate model performance on various gene sets (may add to running time) (default: false)

-v, –verbose

Increase verbosity of module (default: false)

MUSiCC API via python script

MUSiCC can also be used directly inside a python script. Passing variables and flags to the MUSiCC script is done by creating a dictionary and passing it to the function correct_and_normalize, as shown below.

Usage:

>>> from musicc.core import correct_and_normalize
>>> musicc_args = {'input_file': 'test_musicc/lib/python3.3/site-packages/musicc/examples/simulated_ko_relative_abundance.tab', 'output_file': 'MUSiCC.tab','input_format': 'tab', 'output_format': 'tab', 'musicc_inter': True, 'musicc_intra': 'learn_model','compute_scores': True, 'verbose': True}
>>> correct_and_normalize(musicc_args)

Required arguments:

input_file

Input abundance file to correct

Optional arguments:

output_file

Output destination for corrected abundance (default: MUSiCC.tab)

input_format {‘tab’,’csv’}

Option indicating the format of the input file (default: ‘tab’)

output_format {‘tab’,’csv’}

Option indicating the format of the output file (default: ‘tab’)

musicc_inter {True, False}

Apply MUSiCC normalization (default: False)

musicc_intra {‘use_generic’, ‘learn_model’, ‘None’}

Correct abundance per-sample using MUSiCC (default: ‘None’)

compute_scores {True, False}

Calculate model performance on various gene sets (may add to running time) (default: False)

verbose {True, False}

Increase verbosity of module (default: False)

Examples

In the musicc/examples directory, the file simulated_ko_relative_abundance.tab contains simulated KO abundance measurements of 20 samples described in the MUSiCC manuscript. Using this file as input for MUSiCC results in the following files:

  • simulated_ko_MUSiCC_Normalized.tab (only normalization)

  • simulated_ko_MUSiCC_Normalized_Corrected_use_generic.tab (normalize and correct using the generic model learned from HMP)

  • simulated_ko_MUSiCC_Normalized_Corrected_learn_model.tab (normalize and correct learning a new model for each sample)

The commands used were the following (via command line):

run_musicc.py musicc/examples/simulated_ko_relative_abundance.tab -n -perf -v -o musicc/examples/simulated_ko_MUSiCC_Normalized.tab

run_musicc.py musicc/examples/simulated_ko_relative_abundance.tab -n -c use_generic -perf -v -o musicc/examples/simulated_ko_MUSiCC_Normalized_Corrected_use_generic.tab

run_musicc.py musicc/examples/simulated_ko_relative_abundance.tab -n -c learn_model -perf -v -o musicc/examples/simulated_ko_MUSiCC_Normalized_Corrected_learn_model.tab

Citing Information

If you use the MUSiCC software, please cite the following paper:

MUSiCC: A marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome. Ohad Manor and Elhanan Borenstein. Genome Biology

Question forum

For MUSiCC announcements and questions, including notification of new releases, you can visit the MUSiCC users forum.

HISTORY

1.0.4 (7 November, 2019)

  • Fixed indexing bug when reporting learned model performance

1.0.3 (4 August, 2019)

  • Fixed deprecated imports from scikit-learn

  • Added more informative error message when input data contains fewer than 5 USCGs

1.0.2 (17 November, 2016)

  • Replaced scipy.stats.nanmedian with numpy.nanmedian since scipy.stats.nanmedian was deprecated in scipy 0.15

1.0.1 (3 June, 2015)

  • Fixed crashes when running on extremely large files

1.0 (5 January, 2015)

  • Initial release

Authors

MUSiCC is written and maintained by Ohad Manor and the Borenstein group in University of Washington.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MUSiCC-1.0.4.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MUSiCC-1.0.4-py3-none-any.whl (2.5 MB view details)

Uploaded Python 3

File details

Details for the file MUSiCC-1.0.4.tar.gz.

File metadata

  • Download URL: MUSiCC-1.0.4.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.5.2

File hashes

Hashes for MUSiCC-1.0.4.tar.gz
Algorithm Hash digest
SHA256 9209b531062587905fe256e335713133b52c852bd1723c3dd95b308bfb914152
MD5 fbbe71c95b9d0d3a3d72f0aacf6e3557
BLAKE2b-256 d84e3b1eee49a3012c54a4f64f5093b6a779f5b00440648cf529c2ecb95b0660

See more details on using hashes here.

File details

Details for the file MUSiCC-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: MUSiCC-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.5.2

File hashes

Hashes for MUSiCC-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a6325799d07cf15efd622780fc92b4d5869d6fa3e0c90e5f96bdd458b11e31c6
MD5 0adbca612b262529b2c769f9d921d0aa
BLAKE2b-256 a7e7102f716c42487c4e959555baec022095c7cead25783f13f26960f7347e89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page