Skip to main content

enrichm is a toolbox for comparing the functional composition of population genomes

Project description

Publish to PyPI

EnrichM is a set of comparative genomics tools for large sets of metagenome assembled genomes (MAGs). The current functionality includes:

  1. A basic annotation pipeline for MAGs.
  2. A pipeline to determine the metabolic pathways that are encoded by MAGs, using KEGG modules as a reference (although custom pathways can be specified).
  3. A pipeline to identify genes or metabolic pathways that are enriched within and between user-defined groups of genomes (groups can be genomes that are related functionally, phylogenetically, recovered from different environments, etc).
  4. Construct random forest machine learning models from the functional composition of MAGs, metagenomes or transcriptomes.
  5. Apply random forest models to classify new MAGs or metagenomes.

EnrichM is under active development, so there is no guarantee that master is stable. It is recommended to install from a tagged release (see below).

Installation

Dependencies

EnrichM is written in Python 3 and requires >= 3.8. EnrichM requires the following non-Python dependencies:

conda (recommended)

Clone the repository and create the conda environment:

git clone https://github.com/geronimp/enrichM.git
cd enrichM
conda env create -f environment.yml
conda activate enrichm
pip install .

PyPI

pip install enrichm

Note: non-Python dependencies (hmmer, diamond, prodigal, parallel, mmseqs2) must be installed separately when using PyPI.

After installation, you'll need to download the back-end databases.

Setup

Loading EnrichM's database

The database contains Pfam-A HMMs, TIGRfam HMMs, dbCAN HMMs, and KoFamKOALA HMMs. By default it is installed in ~/enrichm_data. Build it using:

enrichm data --create

To store the database in a custom location:

enrichm data --create --db_path /path/to/database/

To uninstall:

enrichm data --uninstall

Using an existing database

If the database was built in a custom location, set the ENRICHM_DB environment variable so EnrichM can find it:

export ENRICHM_DB=/path/to/database/

Add this to your .bashrc or conda activate.d script to avoid setting it each session.

Subcommands

annotate

Annotate population genomes with KO HMMs, Pfam, TIGRfam, and CAZymes using dbCAN. The result is a GFF file for each genome and a frequency matrix for each annotation type (annotation IDs as rows, genomes as columns).

classify

Reads KO annotations in the form of a matrix and determines which KEGG modules are complete. Annotation matrices can be generated using annotate.

enrichment

Enrichment reads an annotation matrix (IDs as rows, genomes as columns) and a metadata file separating genomes into groups, and runs statistical tests (Mann-Whitney U, Fisher's exact, Kruskal-Wallis) to identify enriched annotations between groups. Outputs include effect sizes, fold changes, and FDR-corrected p-values. Additional features include:

  • Synteny analysis: identifies conserved gene blocks (operons) among enriched genes using intergenic distance thresholds
  • Mobile element proximity: flags enriched genes located near transposases or insertion sequences
  • Accepts output from annotate, or external tools including DRAM, eggNOG-mapper

generate

Trains a random forest classifier or regressor from an annotation matrix and a metadata file of labels. Performs automated hyperparameter tuning via RandomizedSearchCV (with optional GridSearchCV refinement). Outputs the trained model, feature importances, and accuracy summary.

predict

Applies a trained model (from generate) to a new annotation matrix and outputs per-sample predictions and class probabilities.

Contact

If you have any feedback about EnrichM, drop an email to the SupportM public help forum. Software by Joel A. Boyd (@geronimp) at the Australian Centre for Ecogenomics (ACE).

License

EnrichM is licensed under the GNU GPL v3+. See LICENSE.txt for further details.

Contributing

I want EnrichM to be as useful as possible, so please feel free to leave feature requests and bug reports.

Citation

If you find EnrichM useful and use it in your work, please cite it as follows:

Comparative genomics using EnrichM. Joel A Boyd, Ben J Woodcroft, Gene W Tyson. In preparation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enrichm-0.6.8.tar.gz (66.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

enrichm-0.6.8-py3-none-any.whl (60.2 kB view details)

Uploaded Python 3

File details

Details for the file enrichm-0.6.8.tar.gz.

File metadata

  • Download URL: enrichm-0.6.8.tar.gz
  • Upload date:
  • Size: 66.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for enrichm-0.6.8.tar.gz
Algorithm Hash digest
SHA256 f91eb3d9501440c4693c6e68c6c12b73fb8e29da001db6204592d17ee34e18bf
MD5 aa2af96d9f9efa054ece4c2d281dc0d9
BLAKE2b-256 b8b96ec1a72283abf4f5f1fe52cde21d3223cea38e3150c5418143069d81bfe6

See more details on using hashes here.

Provenance

The following attestation bundles were made for enrichm-0.6.8.tar.gz:

Publisher: release.yml on geronimp/enrichM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file enrichm-0.6.8-py3-none-any.whl.

File metadata

  • Download URL: enrichm-0.6.8-py3-none-any.whl
  • Upload date:
  • Size: 60.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for enrichm-0.6.8-py3-none-any.whl
Algorithm Hash digest
SHA256 b4fe7e4ae620213bd153d7efa5695101e24043f3860c688657707e65d502475c
MD5 d617acbfa34aa1a966b8ab1dd42f6201
BLAKE2b-256 56390f784c7e84c39d8d97305557a7fb99a246dff8ab7294656b670631013d55

See more details on using hashes here.

Provenance

The following attestation bundles were made for enrichm-0.6.8-py3-none-any.whl:

Publisher: release.yml on geronimp/enrichM

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page