Detect catalytic enzyme residues in protein structures by matching a library of known templates.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

RayHackett

These details have not been verified by PyPI

Project links

Project description

EnzyMM - The Enzyme Motif Miner

📚 Full documentation is availabe here: https://enzymm.readthedocs.io/en/latest/

️Overview

Enzyme Motif Miner uses geometric template matching to identify known arrangements of catalytic residues called templates in protein structures. It searches protein structures provided by the user against a database of templates. EnzyMM ships with a library of catalytic templates derived from the Mechanism and Catalytic Site Atlas (M-CSA) but you can also generate your own. These templates represent consensus arrangements of catalytic sites found in active sites of experimental protein structures.

As catalytic sites are both highly conserved and absolutely critical for the function of a protein, identifying them offers many biological insights. This method has two key advantages. Firstly, as it doesn't rely on sequence or (global) fold similarity, similar catalytic arrangements can be found accross great evolutionary distances offering insights into the divergence or even convergence of enyzmes. Secondly, as geometric matching is very fast, EnzyMM scales along side databases of predicted protein structures. Expect to scan a protein structure in a matter of seconds on consumer laptops.

As a database driven method, EnzyMM is inherently limited by the coverage of residue arrangements in its template library. The provided template library covers nearly the entire M-CSA and thus around 3/4 of enzyme mechanisms classified by the Enzyme Commission to the 3rd level. Catalytic arrangements not found in the PDBe won't be included in the M-CSA. Of course, the user can also provide their own library of templates. While primarily intended for catalytic sites, you are invited to search with your own library of templates.

[!NOTE] For the actual geometric matching EnzyMM relies on PyJess - a Cython wrapper of Jess.

If you just want to try EnzyMM we provide a webserver at https://www.ebi.ac.uk/thornton-srv/m-csa/enzymm .

🔧 Installing EnzyMM

EnzyMM is implemented in Python, and supports all versions from Python 3.7 on Linux and MacOS. It requires additional libraries that can be installed directly from PyPI, the Python Package Index.

Use pip to install EnzyMM on your machine:

$ pip install enzymm

Alternatively, use Anaconda to install EnzyMM. Optionally install gemmi to search CIF files too:

$ conda install -c bioconda enzymm gemmi

This will both install EnzyMM and also download a library of catalytic templates together with important metadata. This requires around 16MB of data to be downloaded. It should also run on windows (though this is not tested for on release).

🖼️ Images

Lightweight images built from python:3.13-alpine are available:

Pull the latest Docker image from GHCR:

docker pull ghcr.io/rayhackett/enzymm:latest

Pull the latest Apptainer image via ORAS from GHCR:

apptainer pull oras://ghcr.io/rayhackett/enzymm:latest

🔎 Running EnzyMM

Once EnzyMM is installed, you can run it from the terminal. The user can either provide a path to a single protein structure -i or to run multiple queries at once, the path to a text file -l which itself contains a list of paths to protein structures. Structures are accepted in both CIF/mmCIF and PDB file format.

[!NOTE] The following compressed file formats are supported too:

.gz (gzip, accelerated via isal if available)

.bz2 (bzip2)

.xz (lzma)

.lz4 (lz4 frame format)

Optionally, an output directory for PDB structures of the identified matches per query protein can be supplied with the --pdbs flag.

$ enzymm -i some_structure.pdb -o results.tsv --pdbs dir_to_save_matches

Additional parameters of interest are:

--per-residue-results, which will return an additional table mapping each matched residue to its original annotation and catalytic role.
--jobs or -j, which controls the number of threads used to parallelize the search. By default, it will use one thread less than available on your system using os.cpu_count.
--unfiltered or -u, which disables filtering of matches by RMSD and residue orientation. By default, filtering is enabled.
--skip-smaller-hits, which skips searches with smaller templates on a query if a match to a larger template has already been found.
--parameters or -p, which controls the RMSD threshold and pairwise distance threshold applied. By default sensible thresholds are selected. Refer to the Docs for details
--template-dir or -t, though which the user may supply their own template library. By default, a library of catalytic templates derived from the M-CSA is loaded.
--conservation-cutoff or -c, which can be set to exclude atoms with B-factors or pLDDT scores below this threshold from matching. This is not set by default.

Further, EnyzMM is designed with modularity in mind and comes with a fully usable internal API. Please refer to the Documentation for further reference.

🖹 Results

Tabular results

EnzyMM will create a single output file by default. You can chose between the default full results or a more simple style with fewer columns using the flag --simple-results:

{output}.tsv: A .tsv file containing a summary of all results. One row is printed per match.

If you pass the --per-residue-results flag, EnzyMM will additionally create a table with one line per matched residue, mapping each residue to its original annotations.

{output}.residues.tsv: A .tsv file containing one row per residue per match.

[!TIP] If you pass the --parquet flag, EnzyMM will write .parquet files instead .tsv files. Note that this additionally requires the polars library.

Aligned Structures

For visual exploration of matches, you can optionally save an alignment of the template and the matched query residues to a PDB file which can be viewed with any molecular viewer. To do so, supply an output directory after the --pdbs flag for the .pdb files.

What will get written depends of in the --transform flag is set or not:

{pdbs_dir}/{query_identifier}_matches.pdb: Default: One .pdb file per query with matched residues in the query written in the query reference frame.
{pdbs_dir}/{template_pdb_identifier}_matches.pdb: Default: One .pdb file per template structure which matches any query written in the template reference frame. In short, --transform forces the output into the template reference frame. Therefore only matches from the same template structure can be aligned which is why we write one file per matched template structure!

Add additional information to each .pdb file with the following flags:

--include-template, which also writes the template structure to the .pdb file
--include-query, which also writes the entire query structure to the .pdb file

[!TIP] If you dont want to save the alignemd structures themselves, consider saving the 4x4 transformation matrices for each match instead. Simply set the --save-transformations flag, which will save the transformation matrix in homogenous coordinates for each match in a numpy .npz file. Requires numpy!

💭 Feedback

⚠️ Issue Tracker

Please report any bugs or feature requests though the GitHub issue tracker. Please also feel free to ask any questions and I will do my best to answer them.
If reporting a bug, please include as much information as you can about the issue and try to recreate the same bug. Ideally include a little test example so I can quickly troubleshoot.

🏗️ Contributing

Contributions are more than welcome! Raise an issue, make a pull request or shoot me an email under r.e.hackett AT lumc.nl
I'm happy to help.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This software is provided under the open source MIT licence.
Though conceived at the EMBL-EBI in Hinxton, UK in the Thornton Group, EnzyMM is now developed by Raymund Hackett and the Zeller Group at the Leiden University Medical Center in Leiden in the Netherlands with continuing support from the Thornton Group.

🔖 Citations

EnyzMM is academic software but relies on many previous approaches.

We kindly ask you to cite both:

EnzyMM for instance as:

Hackett RE et al. (2026), Investigating Enzyme Function by Geometric Matching of Catalytic Motifs [Preprint], BioRxiv, DOI:10.64898/2026.02.10.705182

Mechanism and Catalytic Site Atlas as:

Ribeiro AJM et al. (2017), Nucleic Acids Res, 46, D618-D623. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. DOI:10.1093/nar/gkx1012. PMID:29106569.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

RayHackett

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Apr 15, 2026

0.3.3

Apr 2, 2026

0.3.2

Mar 18, 2026

0.3.1

Dec 15, 2025

0.3.0

Nov 26, 2025

0.2.0

Nov 1, 2025

0.2.0a1 pre-release

Sep 2, 2025

0.1.7

Aug 26, 2025

0.1.6

Aug 21, 2025

0.1.5

Aug 21, 2025

0.1.4

Aug 21, 2025

0.1.3

Aug 21, 2025

0.1.2

Aug 21, 2025

0.1.1

Aug 20, 2025

0.1.0

Aug 20, 2025

0.0.3

Aug 7, 2025

0.0.2

Jul 20, 2025

0.0.1

Jul 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enzymm-0.4.0.tar.gz (12.8 MB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

enzymm-0.4.0-py3-none-any.whl (18.0 MB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file enzymm-0.4.0.tar.gz.

File metadata

Download URL: enzymm-0.4.0.tar.gz
Upload date: Apr 15, 2026
Size: 12.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for enzymm-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`c6a34b4f629f6655f948d9f45e02100f0ce55f719c48f0d09aff92557f8cb541`
MD5	`9af368bd7dd5b27c3c58d981b4404a5f`
BLAKE2b-256	`6744167fff049a10140b094838bb21e70fc4554315fa1d90e1af6a652a250b66`

See more details on using hashes here.

Provenance

The following attestation bundles were made for enzymm-0.4.0.tar.gz:

Publisher: test.yml on RayHackett/enzymm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: enzymm-0.4.0.tar.gz
- Subject digest: c6a34b4f629f6655f948d9f45e02100f0ce55f719c48f0d09aff92557f8cb541
- Sigstore transparency entry: 1307364984
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: RayHackett/enzymm@b01cfde0bf5dc58780028d6436debfa91373a669
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/RayHackett
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: test.yml@b01cfde0bf5dc58780028d6436debfa91373a669
- Trigger Event: push

File details

Details for the file enzymm-0.4.0-py3-none-any.whl.

File metadata

Download URL: enzymm-0.4.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 18.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for enzymm-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4d2dac2e7f2bb224af2011769afb4e963dd36cd287254be060fca671791f8d1a`
MD5	`872413b3ad428d0b920ccdc6315bb070`
BLAKE2b-256	`1dd71b5b32a927081935b17d1dbd8c70b53ef8c7baa731c3729019bc9642f175`

See more details on using hashes here.

Provenance

The following attestation bundles were made for enzymm-0.4.0-py3-none-any.whl:

Publisher: test.yml on RayHackett/enzymm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: enzymm-0.4.0-py3-none-any.whl
- Subject digest: 4d2dac2e7f2bb224af2011769afb4e963dd36cd287254be060fca671791f8d1a
- Sigstore transparency entry: 1307365107
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: RayHackett/enzymm@b01cfde0bf5dc58780028d6436debfa91373a669
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/RayHackett
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: test.yml@b01cfde0bf5dc58780028d6436debfa91373a669
- Trigger Event: push

enzymm 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

EnzyMM - The Enzyme Motif Miner

️Overview

🔧 Installing EnzyMM

🖼️ Images

🔎 Running EnzyMM

🖹 Results

Tabular results

Aligned Structures

💭 Feedback

⚠️ Issue Tracker

🏗️ Contributing

📋 Changelog

⚖️ License

🔖 Citations

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance