Skip to main content

MUSICC: A marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome.

Project description

====================
MUSiCC Documentation
====================

MUSiCC is a marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome,
developed and maintained by the Borenstein group at the University of Washington.

============
Availability
============

MUSiCC is available through the following sources:

- As a Python module from GitHub or PyPI (see installation instructions below)
- As an online tool at: http://elbo.gs.washington.edu/software_musicc.html.

=======
License
=======

MUSiCC is distributed under a BSD license and can be readily incorporated into custom analysis tools.

=========================
Installation Instructions
=========================

Prerequisites for installing:

In order for MUSiCC to run successfully, the following Python modules should be pre-installed on your system:

- Numpy >= 1.6.1 (http://www.numpy.org/)
- Scipy >= 0.9 (http://www.scipy.org/)
- Scikit-learn >= 0.15.2 (http://scikit-learn.org/stable/)
- Pandas >= 0.14 (http://pandas.pydata.org/)

If you have *pip* installed, you can install these packages by running the following command:

``pip install -U numpy scipy scikit-learn pandas``

**Installing MUSiCC:**

To install MUSiCC, download the package from https://github.com/omanor/MUSiCC/archive/1.0.tar.gz

After downloading MUSiCC, you’ll need to unzip the file. If you’ve downloaded the release version, do this with the following command:

``tar -xzf MUSiCC-1.0.tar.gz``

You’ll then change into the new MUSiCC directory as follows:

``cd MUSiCC-1.0``

and install using the following command:

``python setup.py install``

ALTERNATIVELY, you can install MUSiCC directly from PyPI by running:

``pip install -U MUSiCC``

Note for windows users: Under some windows installations, Scipy may fail when importing the Stats module. Workarounds may be found online, such
as `here <https://code.google.com/p/pythonxy/issues/detail?id=745>`_.

============================
Testing the software package
============================

After downloading and installing the software, we recommend testing it by running the following command:

``test_musicc.py``

This will invoke a series of tests. A correct output should end with:

``Ran 3 tests in X.XXXXs``

``OK``

===============================
MUSiCC API via the command line
===============================
The MUSiCC module handles all calculations internally.
MUSiCC offers an interface to the MUSiCC functionality via the command line and the run_musicc script.

Usage:
------

``run_musicc.py input_file [options]``

Required arguments:
-------------------

**input_file**
Input abundance file to correct

Optional arguments:
-------------------

**-h, --help**
show help message and exit

**-o OUTPUT_FILE, --out OUTPUT_FILE**
Output destination for corrected abundance (default: MUSiCC.tab)

**-if {tab,csv}, --input_format {tab,csv}**
Option indicating the format of the input file (default: tab)

**-of {tab,csv}, --output_format {tab,csv}**
Option indicating the format of the output file (default: tab)

**-n, --normalize**
Apply MUSiCC normalization (default: false)

**-c {use_generic, learn_model}, --correct {use_generic,learn_model}**
Correct abundance per-sample using MUSiCC (default: false)

**-perf, --performance**
Calculate model performance on various gene sets (may add to running time) (default: false)

**-v, --verbose**
Increase verbosity of module (default: false)


============================
MUSiCC API via python script
============================
MUSiCC can also be used directly inside a python script. Passing variables and flags to the MUSiCC script is done by
creating a dictionary and passing it to the function *correct_and_normalize*, as shown below.

Usage:
------

>>> from musicc.core import correct_and_normalize
>>> musicc_args = {'input_file': 'test_musicc/lib/python3.3/site-packages/musicc/examples/simulated_ko_relative_abundance.tab', 'output_file': 'MUSiCC.tab','input_format': 'tab', 'output_format': 'tab', 'musicc_inter': True, 'musicc_intra': 'learn_model','compute_scores': True, 'verbose': True}
>>> correct_and_normalize(musicc_args)

Required arguments:
-------------------

**input_file**
Input abundance file to correct

Optional arguments:
-------------------

**output_file**
Output destination for corrected abundance (default: MUSiCC.tab)

**input_format {'tab','csv'}**
Option indicating the format of the input file (default: 'tab')

**output_format {'tab','csv'}**
Option indicating the format of the output file (default: 'tab')

**musicc_inter {True, False}**
Apply MUSiCC normalization (default: False)

**musicc_intra {'use_generic', 'learn_model', 'None'}**
Correct abundance per-sample using MUSiCC (default: 'None')

**compute_scores {True, False}**
Calculate model performance on various gene sets (may add to running time) (default: False)

**verbose {True, False}**
Increase verbosity of module (default: False)

========
Examples
========
In the *musicc/examples* directory, the file *simulated_ko_relative_abundance.tab* contains simulated KO abundance measurements of 20 samples described in the
MUSiCC manuscript. Using this file as input for MUSiCC results in the following files:

- simulated_ko_MUSiCC_Normalized.tab (only normalization)
- simulated_ko_MUSiCC_Normalized_Corrected_use_generic.tab (normalize and correct using the generic model learned from HMP)
- simulated_ko_MUSiCC_Normalized_Corrected_learn_model.tab (normalize and correct learning a new model for each sample)

The commands used were the following (via command line):

``run_musicc.py musicc/examples/simulated_ko_relative_abundance.tab -n -perf -v -o musicc/examples/simulated_ko_MUSiCC_Normalized.tab``

``run_musicc.py musicc/examples/simulated_ko_relative_abundance.tab -n -c use_generic -perf -v -o musicc/examples/simulated_ko_MUSiCC_Normalized_Corrected_use_generic.tab``

``run_musicc.py musicc/examples/simulated_ko_relative_abundance.tab -n -c learn_model -perf -v -o musicc/examples/simulated_ko_MUSiCC_Normalized_Corrected_learn_model.tab``

==================
Citing Information
==================

If you use the MUSiCC software, please cite the following paper:

MUSiCC: A marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome.
**Ohad Manor and Elhanan Borenstein.** *Genome Biology*

==================
Question forum
==================
For MUSiCC announcements and questions, including notification of new releases, you can visit the `MUSiCC users forum <https://groups.google.com/forum/#!forum/musicc-users>`_.


=======
HISTORY
=======

=========================
1.0.2 (17 November, 2016)
=========================
* Replaced scipy.stats.nanmedian with numpy.nanmedian since
scipy.stats.nanmedian was deprecated in scipy 0.15

====================
1.0.1 (3 June, 2015)
====================
* Fixed crashes when running on extremely large files

=====================
1.0 (5 January, 2015)
=====================
* Initial release




=======
Authors
=======

MUSiCC is written and maintained by Ohad Manor and the Borenstein group in University of Washington.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
MUSiCC-1.0.2.tar.gz (2.5 MB) Copy SHA256 hash SHA256 Source None Nov 17, 2016

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page