No project description provided

These details have not been verified by PyPI

Project links

Project description

A pipeline for enzyme engineering

Enzyme-tk is a collection of tools for enzyme engineering, setup as interoperable modules that act on dataframes. These modules are designed to be imported into pipelines for specific function. For this reason, steps as each module is called (e.g. finding similar proteins with BLAST would be considered a step) are designed to be as light as possible. An example of a pipeline is the annotate-e ` pipeline, this acts to annotate a fasta with an ensemble of methods (each is designated as an Enzyme-tk step).

Quick Start Colab notebook

If you want to try a colab notebook here is an example: (colab)

Data link: git clone https://huggingface.co/datasets/arianemora/enzyme-tk

Moving to a new home:

Since I started at AITHYRA this is migrating to a new home at moragroup/enzyme-tk so will be maintaied there.

Quick Start Colab notebook

If you want to try a colab notebook here is an example: (colab)

If you have any issues installing, let me know - this has been tested only on Linux/Ubuntu. Please post an issue!

Installation

Install base package to import modules

conda create --name enzymetk python==3.10 -y
# Install torch for your specific cuda version
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130
pip install enzymetk==0.0.7

Install only the specific requirements you need (recommended)

For installation instructions check out the wiki.

Install only the specific requirements you need (recomended)

For this clone the repo and then install the requirements for the specific modules you use

git clone git@github.com:ArianeMora/enzyme-tk.git
cd enzymetk/conda_envs/ # would recommend looking at these
# e.g. to install all from within that folder you would do
source install_all.sh

For more extensive installation instructions check out the wiki.

Usage

If you have any issues at all just email me using my caltech email: amora at aithyra . ac . at

This is a work-in progress! e.g. some tools (e.g. proteInfer and CLEAN) require extra data to be downloaded in order to run (like model weights.) I'm working on integrating these atm, buzz me if you need this!

Here are some of the tools that have been implemented to be chained together as a pipeline:

boltz2 mmseqs2
foldseek
diamond
proteinfer
CLEAN
chai
chemBERTa2
SELFormer
rxnfp
clustalomega
CREEP
esm
LigandMPNN
vina
Uni-Mol
fasttree
Porechop
prokka

Things to note

All the tools use the conda env of enzymetk by default.

If you want to use a different conda env, you can do so by passing the env_name argument to the constructor of the step.

For example:

proteinfer = ProteInfer(env_name='proteinfer')

Arguments

All the arguments are passed to the constructor of the step, the ones that are required are passed as arguments to the constructor and the ones that are optional are passed as a list to the args argument, this needs to be a list as one would normally pass arguments to a command line tool.

For example:

proteinfer = ProteInfer(env_name='proteinfer', args=['--num_threads', '10'])

For those wanting to use specific arguments, check the individual tools for specifics.

Steps

The steps are the main building blocks of the pipeline. They are responsible for executing the individual tools.

Syntax

We use the operator >> to pass the output of one tool to the next. All expect a dataframe as input, and produce a dataframe as output. You can capture the end by using the = sign, or save it.

For example:

df = df << (ActiveSitePred(id_col, seq_col, num_threads, tmp_dir='tmp/') >> EmbedESM(id_col, seq_col, extraction_method='mean', 
                     tmp_dir='tmp/', rep_num=36) >> Save('tmp/esm2_test_active_site.pkl'))

Will run squidly to predict the active sites first, then pass the sequences to ESM2 then save that new dataframe.

You can chain most steps together, some dataframes remove things like the sequence, when it's not necessary so if you find one that can't be chained but would like to use it as part of a pipeline either let me know or just make a pull request!

Tools and references

Being a toolkit this is a collection of other tools, which means if you use any of these tools then cite the ones relevant to your work:

mmseqs2
foldseek
diamond
proteinfer
CLEAN
chai
chemBERTa2
SELFormer
rxnfp
clustalomega
CREEP
esm
LigandMPNN
vina
Uni-Mol
fasttree
Porechop
prokka

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.9

Mar 3, 2026

0.0.8

Feb 20, 2026

0.0.7

Feb 2, 2026

0.0.6

Dec 22, 2025

0.0.4

Aug 10, 2025

0.0.3

Aug 9, 2025

0.0.2

Apr 23, 2025

0.0.1

Apr 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enzymetk-0.0.9.tar.gz (35.4 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

enzymetk-0.0.9-py3-none-any.whl (52.4 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file enzymetk-0.0.9.tar.gz.

File metadata

Download URL: enzymetk-0.0.9.tar.gz
Upload date: Mar 3, 2026
Size: 35.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for enzymetk-0.0.9.tar.gz
Algorithm	Hash digest
SHA256	`f4d8cc9041236919f03ffb0fd5f4f2a46aa6b3cf783ba6a28f40add25b482508`
MD5	`d6893edd7b06cd54f1da86d2a122c864`
BLAKE2b-256	`d14b1ef53871f8bb8e2ba78d29eaa891e8ebdb259efac2b87b635142930c85c7`

See more details on using hashes here.

File details

Details for the file enzymetk-0.0.9-py3-none-any.whl.

File metadata

Download URL: enzymetk-0.0.9-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 52.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for enzymetk-0.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f60df97d598caf88cc540f5b8721243a2f0c6e225d6e1635f5cc90c833e56fea`
MD5	`e15eb5ce50e97e5a5274f845f2fbd9cc`
BLAKE2b-256	`c32ceeb1cda23304c120f69c8ba53015841d618347dba8e7cb4dd9b01372cbe9`

See more details on using hashes here.

enzymetk 0.0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

A pipeline for enzyme engineering

Quick Start Colab notebook

Moving to a new home:

Quick Start Colab notebook

Installation

Install base package to import modules

Install only the specific requirements you need (recommended)

Install only the specific requirements you need (recomended)

Usage

Things to note

Arguments

Steps

Syntax

Tools and references

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes