Skip to main content

No project description provided

Project description

A pipeline for enzyme engineering

Enzyme-tk is a collection of tools for enzyme engineering, setup as interoperable modules that act on dataframes. These modules are designed to be imported into pipelines for specific function. For this reason, steps as each module is called (e.g. finding similar proteins with BLAST would be considered a step) are designed to be as light as possible. An example of a pipeline is the annotate-e ` pipeline, this acts to annotate a fasta with an ensemble of methods (each is designated as an Enzyme-tk step).

Quick Start Colab notebook

If you want to try a colab notebook here is an example: (colab)

Data link: git clone https://huggingface.co/datasets/arianemora/enzyme-tk

Moving to a new home:

Since I started at AITHYRA this is migrating to a new home at moragroup/enzyme-tk so will be maintaied there.

Quick Start Colab notebook

If you want to try a colab notebook here is an example: (colab)

If you have any issues installing, let me know - this has been tested only on Linux/Ubuntu. Please post an issue!

Installation

Install base package to import modules

conda create --name enzymetk python==3.10 -y
# Install torch for your specific cuda version
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130
pip install enzymetk==0.0.7

Install only the specific requirements you need (recommended)

For installation instructions check out the wiki.

Install only the specific requirements you need (recomended)

For this clone the repo and then install the requirements for the specific modules you use

git clone git@github.com:ArianeMora/enzyme-tk.git
cd enzymetk/conda_envs/ # would recommend looking at these
# e.g. to install all from within that folder you would do
source install_all.sh

For more extensive installation instructions check out the wiki.

Usage

If you have any issues at all just email me using my caltech email: amora at aithyra . ac . at

This is a work-in progress! e.g. some tools (e.g. proteInfer and CLEAN) require extra data to be downloaded in order to run (like model weights.) I'm working on integrating these atm, buzz me if you need this!

Here are some of the tools that have been implemented to be chained together as a pipeline:

boltz2 mmseqs2
foldseek
diamond
proteinfer
CLEAN
chai
chemBERTa2
SELFormer
rxnfp
clustalomega
CREEP
esm
LigandMPNN
vina
Uni-Mol
fasttree
Porechop
prokka

Things to note

All the tools use the conda env of enzymetk by default.

If you want to use a different conda env, you can do so by passing the env_name argument to the constructor of the step.

For example:

proteinfer = ProteInfer(env_name='proteinfer')

Arguments

All the arguments are passed to the constructor of the step, the ones that are required are passed as arguments to the constructor and the ones that are optional are passed as a list to the args argument, this needs to be a list as one would normally pass arguments to a command line tool.

For example:

proteinfer = ProteInfer(env_name='proteinfer', args=['--num_threads', '10'])

For those wanting to use specific arguments, check the individual tools for specifics.

Steps

The steps are the main building blocks of the pipeline. They are responsible for executing the individual tools.

Syntax

We use the operator >> to pass the output of one tool to the next. All expect a dataframe as input, and produce a dataframe as output. You can capture the end by using the = sign, or save it.

For example:

df = df << (ActiveSitePred(id_col, seq_col, num_threads, tmp_dir='tmp/') >> EmbedESM(id_col, seq_col, extraction_method='mean', 
                     tmp_dir='tmp/', rep_num=36) >> Save('tmp/esm2_test_active_site.pkl'))

Will run squidly to predict the active sites first, then pass the sequences to ESM2 then save that new dataframe.

You can chain most steps together, some dataframes remove things like the sequence, when it's not necessary so if you find one that can't be chained but would like to use it as part of a pipeline either let me know or just make a pull request!

Tools and references

Being a toolkit this is a collection of other tools, which means if you use any of these tools then cite the ones relevant to your work:

mmseqs2
foldseek
diamond
proteinfer
CLEAN
chai
chemBERTa2
SELFormer
rxnfp
clustalomega
CREEP
esm
LigandMPNN
vina
Uni-Mol
fasttree
Porechop
prokka

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enzymetk-0.0.9.tar.gz (35.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

enzymetk-0.0.9-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file enzymetk-0.0.9.tar.gz.

File metadata

  • Download URL: enzymetk-0.0.9.tar.gz
  • Upload date:
  • Size: 35.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for enzymetk-0.0.9.tar.gz
Algorithm Hash digest
SHA256 f4d8cc9041236919f03ffb0fd5f4f2a46aa6b3cf783ba6a28f40add25b482508
MD5 d6893edd7b06cd54f1da86d2a122c864
BLAKE2b-256 d14b1ef53871f8bb8e2ba78d29eaa891e8ebdb259efac2b87b635142930c85c7

See more details on using hashes here.

File details

Details for the file enzymetk-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: enzymetk-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 52.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for enzymetk-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f60df97d598caf88cc540f5b8721243a2f0c6e225d6e1635f5cc90c833e56fea
MD5 e15eb5ce50e97e5a5274f845f2fbd9cc
BLAKE2b-256 c32ceeb1cda23304c120f69c8ba53015841d618347dba8e7cb4dd9b01372cbe9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page