enrichm is a toolbox for comparing the functional composition of population genomes
Project description
EnrichM is a set of comparative genomics tools for large sets of metagenome assembled genomes (MAGs). The current functionality includes:
- A basic annotation pipeline for MAGs.
- A pipeline to determine the metabolic pathways that are encoded by MAGs, using KEGG modules as a reference (although custom pathways can be specified).
- A pipeline to identify genes or metabolic pathways that are enriched within and between user-defined groups of genomes (groups can be genomes that are related functionally, phylogenetically, recovered from different environments, etc).
- Construct random forest machine learning models from the functional composition of MAGs, metagenomes or transcriptomes.
- Apply random forest models to classify new MAGs or metagenomes.
EnrichM is under active development, so there is no guarantee that master is stable. It is recommended to install from a tagged release (see below).
Installation
Dependencies
EnrichM is written in Python 3 and requires >= 3.8. EnrichM requires the following non-Python dependencies:
conda (recommended)
Clone the repository and create the conda environment:
git clone https://github.com/geronimp/enrichM.git
cd enrichM
conda env create -f environment.yml
conda activate enrichm
pip install .
PyPI
pip install enrichm
Note: non-Python dependencies (hmmer, diamond, prodigal, parallel, mmseqs2) must be installed separately when using PyPI.
After installation, you'll need to download the back-end databases.
Setup
Loading EnrichM's database
The database contains Pfam-A HMMs, TIGRfam HMMs, dbCAN HMMs, and KoFamKOALA HMMs. By default it is installed in ~/enrichm_data. Build it using:
enrichm data --create
To store the database in a custom location:
enrichm data --create --db_path /path/to/database/
To uninstall:
enrichm data --uninstall
Using an existing database
If the database was built in a custom location, set the ENRICHM_DB environment variable so EnrichM can find it:
export ENRICHM_DB=/path/to/database/
Add this to your .bashrc or conda activate.d script to avoid setting it each session.
Subcommands
annotate
Annotate population genomes with KO HMMs, Pfam, TIGRfam, and CAZymes using dbCAN. The result is a GFF file for each genome and a frequency matrix for each annotation type (annotation IDs as rows, genomes as columns).
classify
Reads KO annotations in the form of a matrix and determines which KEGG modules are complete. Annotation matrices can be generated using annotate.
enrichment
Enrichment reads an annotation matrix (IDs as rows, genomes as columns) and a metadata file separating genomes into groups, and runs statistical tests (Mann-Whitney U, Fisher's exact, Kruskal-Wallis) to identify enriched annotations between groups. Outputs include effect sizes, fold changes, and FDR-corrected p-values. Additional features include:
- Synteny analysis: identifies conserved gene blocks (operons) among enriched genes using intergenic distance thresholds
- Mobile element proximity: flags enriched genes located near transposases or insertion sequences
- Accepts output from
annotate, or external tools including DRAM, eggNOG-mapper
generate
Trains a random forest classifier or regressor from an annotation matrix and a metadata file of labels. Performs automated hyperparameter tuning via RandomizedSearchCV (with optional GridSearchCV refinement). Outputs the trained model, feature importances, and accuracy summary.
predict
Applies a trained model (from generate) to a new annotation matrix and outputs per-sample predictions and class probabilities.
Contact
If you have any feedback about EnrichM, drop an email to the SupportM public help forum. Software by Joel A. Boyd (@geronimp) at the Australian Centre for Ecogenomics (ACE).
License
EnrichM is licensed under the GNU GPL v3+. See LICENSE.txt for further details.
Contributing
I want EnrichM to be as useful as possible, so please feel free to leave feature requests and bug reports.
Citation
If you find EnrichM useful and use it in your work, please cite it as follows:
Comparative genomics using EnrichM. Joel A Boyd, Ben J Woodcroft, Gene W Tyson. In preparation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file enrichm-0.6.8.tar.gz.
File metadata
- Download URL: enrichm-0.6.8.tar.gz
- Upload date:
- Size: 66.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f91eb3d9501440c4693c6e68c6c12b73fb8e29da001db6204592d17ee34e18bf
|
|
| MD5 |
aa2af96d9f9efa054ece4c2d281dc0d9
|
|
| BLAKE2b-256 |
b8b96ec1a72283abf4f5f1fe52cde21d3223cea38e3150c5418143069d81bfe6
|
Provenance
The following attestation bundles were made for enrichm-0.6.8.tar.gz:
Publisher:
release.yml on geronimp/enrichM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
enrichm-0.6.8.tar.gz -
Subject digest:
f91eb3d9501440c4693c6e68c6c12b73fb8e29da001db6204592d17ee34e18bf - Sigstore transparency entry: 1167897558
- Sigstore integration time:
-
Permalink:
geronimp/enrichM@a97a9e54203d5b32f4c39cea352f8940492e143a -
Branch / Tag:
refs/tags/v0.6.8 - Owner: https://github.com/geronimp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a97a9e54203d5b32f4c39cea352f8940492e143a -
Trigger Event:
push
-
Statement type:
File details
Details for the file enrichm-0.6.8-py3-none-any.whl.
File metadata
- Download URL: enrichm-0.6.8-py3-none-any.whl
- Upload date:
- Size: 60.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4fe7e4ae620213bd153d7efa5695101e24043f3860c688657707e65d502475c
|
|
| MD5 |
d617acbfa34aa1a966b8ab1dd42f6201
|
|
| BLAKE2b-256 |
56390f784c7e84c39d8d97305557a7fb99a246dff8ab7294656b670631013d55
|
Provenance
The following attestation bundles were made for enrichm-0.6.8-py3-none-any.whl:
Publisher:
release.yml on geronimp/enrichM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
enrichm-0.6.8-py3-none-any.whl -
Subject digest:
b4fe7e4ae620213bd153d7efa5695101e24043f3860c688657707e65d502475c - Sigstore transparency entry: 1167897607
- Sigstore integration time:
-
Permalink:
geronimp/enrichM@a97a9e54203d5b32f4c39cea352f8940492e143a -
Branch / Tag:
refs/tags/v0.6.8 - Owner: https://github.com/geronimp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a97a9e54203d5b32f4c39cea352f8940492e143a -
Trigger Event:
push
-
Statement type: