Skip to main content

EMPANADA: a tool for evidence-based assignment of genes to pathways in metagenomic data

Project description

EMPANADA Documentation

EMPANADA is a tool for evidence-based assignment of genes to pathways in metagenomic data, developed and maintained by the Borenstein group at the University of Washington.

Availability

EMPANADA is available as a Python module from GitHub or PyPI (see installation instructions below)

License

EMPANADA is distributed under a non-commercial license (see LICENSE).

Installation Instructions

Prerequisites for installing:

In order for EMPANADA to run successfully, the following Python modules should be pre-installed on your system:

If you have pip installed, you can install these packages by running the following command:

pip install -U numpy pandas

Installing EMPANADA:

To install EMPANADA, download the package from https://github.com/borenstein-lab/empanada/archive/0.0.3.tar.gz

After downloading EMPANADA, you’ll need to unzip the file. If you’ve downloaded the release version, do this with the following command:

tar -xzf empanada-0.0.3.tar.gz

You’ll then change into the new EMPANADA directory as follows:

cd empanada-0.0.3

and install using the following command:

python setup.py install

ALTERNATIVELY, you can install EMPANADA directly from PyPI by running:

pip install -U empanada

Testing the software package

After downloading and installing the software, we recommend testing it by running the following command:

test_empanada.py

This will invoke a series of tests. A correct output should end with:

Ran 1 tests in X.XXXXs

OK

EMPANADA API via the command line

The EMPANADA module handles all calculations internally. EMPANADA offers an interface to the EMPANADA functionality via the command line and the run_empanada script.

Usage:

run_empanada.py -ko KO_ABUNDANCE_FILE -ko2path KO_TO_PATHWAY_FILE [options]

Required arguments:

-ko KO_ABUNDANCE_FILE

Input KO abundance file to aggregate to pathway abundance

-ko2path KO_TO_PATHWAY_FILE

Input file of KO-to-pathway mapping

Optional arguments:

-h, –help

show help message and exit

-o, –output,

Output file for resulting pathway abundance (default: out.tab)

-oc, –output_counts,

Output file for number of KOs mapped to each pathway (default: counts.tab)

-om, –output_mapping,

Output the mapping table (either given or generated) to file, works only with pooled mappings (default: mapping.tab)

-map {naive, by_support, by_sum_abundance, by_avg_abundance}, –mapping_method {naive, by_support, by_sum_abundance, by_avg_abundance}

Method to map KOs to Pathway (default: naive)

-compute {sum}, –compute_method {sum}

Method to compute pathway abundance from mapped KOs (default: sum)

-threshold, –abundance_threshold

Abundance threshold to include KOs (default: 0.0)

-fraction, –fractional_ko_contribution

Divide KO contributions such that they sum to 1 for each KO (default: False)

-remove_ko_with_no_pathway

Remove KOs with no pathway from analysis (default: False)

-remove_ko_with_no_abundance_measurement

Remove KOs with no measurements in the abundance table from analysis (default: False)

-transpose_ko, –transpose_ko_abundance

Transpose the ko abundance matrix given (default: False)

-transpose_output, –transpose_output

Transpose the output pathway abundance matrix (default: False)

-permute_ko_mapping

Permute the given KO mapping, i.e., which KO map to which pathways for hypothesis testing (default: False)

-use_only_non_overlapping_genes

If the mapping is by_abundance, compute pathway support by only using non-overlapping genes (default: False)

-pool_samples_use_median

If the mapping is by_abundance, pool samples together using the median KO abundance, and learn the mapping only once (default: False)

-pool_samples_use_average

If the mapping is by_abundance, pool samples together using the average KO abundance, and learn the mapping only once (default: False)

-leave_one_ko_out_pathway_support

If the mapping is by_abundance, compute pathway support for each KO separately by removing it from the computation (default: False)

-compute_support_with_weighted_double_counting

If the mapping is by_abundance, double count KO abundance (weighted by mapping) when computing pathway support (default: False)

-v, –verbose

Increase verbosity of module (default: False)

Examples

In the empanada/examples directory, the file simulated_ko_relative_abundance.tab contains simulated KO abundance measurements of 20 samples. Using this file as input for EMPANADA results in the following files:

  • pathway_abundance_empanada.tab

The command used are the following (via command line):

run_empanada.py -ko examples/simulated_ko_relative_abundance.tab -ko2path data/KOvsPATHWAY_BACTERIAL_KEGG_2013_07_15.tab -o examples/pathway_abundance_empanada.tab -threshold 0 -map by_avg_abundance -fraction -leave_one_ko_out_pathway_support -use_only_non_overlapping_genes

Citing Information

If you use the EMPANADA software, please cite the following paper:

Functional variability in the human microbiome: More than meets the eye Ohad Manor and Elhanan Borenstein. In preparation

HISTORY

0.0.3 (6 January, 2020)

  • Fixed deprecated numpy usage

  • Updating numpy version requirement to >=1.16.0

0.0.1 (9 February, 2016)

  • Initial release of beta version

Authors

EMPANADA is written and maintained by Ohad Manor and the Borenstein group in University of Washington.

EMPANADA Software License Agreement

EMPANADA (C) 2014-2016, University of Washington. All rights reserved.

Subject to the terms below, the University of Washington (“UW”), Professor Elhanan Borenstein, and Ohad Manor (“Developer(s)”) give permission for you and other members of your laboratory for as long as they remain members (“Academic User(s)”), such permission granted solely to Academic Users in a nonprofit institution of higher education or a nonprofit research institution (“University”), to use EMPANADA solely as further detailed below. EMPANADA is a tool for evidence-based assignment of genes to pathways in metagenomic data. EMPANADA is protected by a copyright. The National Institutes of Health supported work on EMPANADA. The UW and the Developers allow Academic Users to perform, copy, and modify EMPANADA, solely for internal, non-profit academic research purposes, and as long as Academic Users comply with the terms of this EMPANADA Software License Agreement:

  1. EMPANADA is not used for any commercial purposes, or as part of a system which has commercial purposes. The EMPANADA software remains at your University and is not published, distributed, or otherwise transferred or made available to other than Academic Users.

  2. You may not distribute EMPANADA or any modification to EMPANADA to any third party.

If you wish to obtain EMPANADA software for any commercial purposes, you will need to contact the University of Washington to see if rights are available and to negotiate a commercial license and pay a fee among other requirements. This includes, but is not limited to, using EMPANADA to provide services to outside parties for a fee. In that case please contact:

UW CoMotion University of Washington 4311 11th Ave. NE, Suite 500 Seattle, WA 98105-4608 Phone: (206) 543-3970 Email: license@uw.edu

  1. You retain in EMPANADA and any modifications to EMPANADA, the copyright, trademark, patent or other notices pertaining to EMPANADA as provided by UW.

  2. You acknowledge that the Developers, the UW and its licensees may develop modifications to EMPANADA that may be substantially similar to your modifications of EMPANADA, and that the Developers, UW and its licensees shall not be constrained in any way by you in UW’s or its licensees’ use or management of such modifications. You acknowledge the right of the Developers and UW to prepare and publish modifications to EMPANADA that may be substantially similar or functionally equivalent to your modifications and improvements, and if you obtain patent protection for any modification or improvement to EMPANADA you agree not to allege or enjoin infringement of your patent by the Developers, the UW or by any of UW’s licensees obtaining modifications or improvements to EMPANADA from the University of Washington or the Developers.

  3. If utilization of the EMPANADA software results in outcomes which will be published, you will specify the version of EMPANADA you used and cite the UW Developers.

  4. Any risk associated with using the EMPANADA software at your organization is with you and your organization. EMPANADA is experimental in nature and is made available as a research courtesy “AS IS,” expressly without any obligation by UW to provide accompanying services or support.

  5. UW AND THE DEVELOPERS EXPRESSLY DISCLAIM ANY AND ALL WARRANTIES REGARDING THE SOFTWARE, WHETHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES PERTAINING TO MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

  6. This Software License Agreement and all rights granted under it terminate on December 31st, 2020. Upon termination, you agree to remove so as to make unrecoverable the original EMPANADA software, all copies and all modifications thereof.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

empanada-0.0.3.tar.gz (475.1 kB view hashes)

Uploaded Source

Built Distribution

empanada-0.0.3-py3-none-any.whl (493.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page