EMPANADA: a tool for evidence-based assignment of genes to pathways in metagenomic data
Project description
EMPANADA Documentation
EMPANADA is a tool for evidence-based assignment of genes to pathways in metagenomic data, developed and maintained by the Borenstein group at the University of Washington.
Availability
EMPANADA is available as a Python module from GitHub or PyPI (see installation instructions below)
License
EMPANADA is distributed under a non-commercial license (see LICENSE).
Installation Instructions
Prerequisites for installing:
In order for EMPANADA to run successfully, the following Python modules should be pre-installed on your system:
Numpy >= 1.16.0 (http://www.numpy.org/)
Pandas >= 0.14 (http://pandas.pydata.org/)
If you have pip installed, you can install these packages by running the following command:
pip install -U numpy pandas
Installing EMPANADA:
To install EMPANADA, download the package from https://github.com/borenstein-lab/empanada/archive/0.0.3.tar.gz
After downloading EMPANADA, you’ll need to unzip the file. If you’ve downloaded the release version, do this with the following command:
tar -xzf empanada-0.0.3.tar.gz
You’ll then change into the new EMPANADA directory as follows:
cd empanada-0.0.3
and install using the following command:
python setup.py install
ALTERNATIVELY, you can install EMPANADA directly from PyPI by running:
pip install -U empanada
Testing the software package
After downloading and installing the software, we recommend testing it by running the following command:
test_empanada.py
This will invoke a series of tests. A correct output should end with:
Ran 1 tests in X.XXXXs
OK
EMPANADA API via the command line
The EMPANADA module handles all calculations internally. EMPANADA offers an interface to the EMPANADA functionality via the command line and the run_empanada script.
Usage:
run_empanada.py -ko KO_ABUNDANCE_FILE -ko2path KO_TO_PATHWAY_FILE [options]
Required arguments:
- -ko KO_ABUNDANCE_FILE
Input KO abundance file to aggregate to pathway abundance
- -ko2path KO_TO_PATHWAY_FILE
Input file of KO-to-pathway mapping
Optional arguments:
- -h, –help
show help message and exit
- -o, –output,
Output file for resulting pathway abundance (default: out.tab)
- -oc, –output_counts,
Output file for number of KOs mapped to each pathway (default: counts.tab)
- -om, –output_mapping,
Output the mapping table (either given or generated) to file, works only with pooled mappings (default: mapping.tab)
- -map {naive, by_support, by_sum_abundance, by_avg_abundance}, –mapping_method {naive, by_support, by_sum_abundance, by_avg_abundance}
Method to map KOs to Pathway (default: naive)
- -compute {sum}, –compute_method {sum}
Method to compute pathway abundance from mapped KOs (default: sum)
- -threshold, –abundance_threshold
Abundance threshold to include KOs (default: 0.0)
- -fraction, –fractional_ko_contribution
Divide KO contributions such that they sum to 1 for each KO (default: False)
- -remove_ko_with_no_pathway
Remove KOs with no pathway from analysis (default: False)
- -remove_ko_with_no_abundance_measurement
Remove KOs with no measurements in the abundance table from analysis (default: False)
- -transpose_ko, –transpose_ko_abundance
Transpose the ko abundance matrix given (default: False)
- -transpose_output, –transpose_output
Transpose the output pathway abundance matrix (default: False)
- -permute_ko_mapping
Permute the given KO mapping, i.e., which KO map to which pathways for hypothesis testing (default: False)
- -use_only_non_overlapping_genes
If the mapping is by_abundance, compute pathway support by only using non-overlapping genes (default: False)
- -pool_samples_use_median
If the mapping is by_abundance, pool samples together using the median KO abundance, and learn the mapping only once (default: False)
- -pool_samples_use_average
If the mapping is by_abundance, pool samples together using the average KO abundance, and learn the mapping only once (default: False)
- -leave_one_ko_out_pathway_support
If the mapping is by_abundance, compute pathway support for each KO separately by removing it from the computation (default: False)
- -compute_support_with_weighted_double_counting
If the mapping is by_abundance, double count KO abundance (weighted by mapping) when computing pathway support (default: False)
- -v, –verbose
Increase verbosity of module (default: False)
Examples
In the empanada/examples directory, the file simulated_ko_relative_abundance.tab contains simulated KO abundance measurements of 20 samples. Using this file as input for EMPANADA results in the following files:
pathway_abundance_empanada.tab
The command used are the following (via command line):
run_empanada.py -ko examples/simulated_ko_relative_abundance.tab -ko2path data/KOvsPATHWAY_BACTERIAL_KEGG_2013_07_15.tab -o examples/pathway_abundance_empanada.tab -threshold 0 -map by_avg_abundance -fraction -leave_one_ko_out_pathway_support -use_only_non_overlapping_genes
Citing Information
If you use the EMPANADA software, please cite the following paper:
Functional variability in the human microbiome: More than meets the eye Ohad Manor and Elhanan Borenstein. In preparation
HISTORY
0.0.3 (6 January, 2020)
Fixed deprecated numpy usage
Updating numpy version requirement to >=1.16.0
0.0.1 (9 February, 2016)
Initial release of beta version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file empanada-0.0.3.tar.gz
.
File metadata
- Download URL: empanada-0.0.3.tar.gz
- Upload date:
- Size: 475.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ad6c243a1407f1a1cb730564de5852fd64cbee60322da8cb69006ba1d5d0596 |
|
MD5 | b4397457424e498b3c4d50dfa1b842de |
|
BLAKE2b-256 | 17ec21393d5b3c71ce6ba68b882e392121d77ca84c3935e5836b7ba2f02e523b |
File details
Details for the file empanada-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: empanada-0.0.3-py3-none-any.whl
- Upload date:
- Size: 493.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4595c99e5346f3b40fa535071d82aa09aee62b51563428b83d995bb72dc4ea52 |
|
MD5 | bd452d524c9870d79f4c953ffb73fbe1 |
|
BLAKE2b-256 | 20729134c0b1cc1394d31c5e6464de420d45a84c3fb1863ba674c86c19f263e9 |