Skip to main content

A Computational Workflow for Structure-Guided Design of Potent and Selective Kinase Peptide Substrates

Project description

Subtimizer

A Computational Workflow for Structure-Guided Design of Potent and Selective Kinase Peptide Substrates

DOI PyPI version GitHub release (latest by date)


A. Contents

A. Contents
B. Overview
C. Configuration
D. Prerequisites
E. Installation
F. Usage
G. Citation


B. Overview

Subtimizer provides an automated, structure-guided workflow for designing peptide substrates for kinases. It integrates AlphaFold-Multimer for structural modeling, ProteinMPNN for sequence design, and AlphaFold2-based interface evaluation of designed substrates.

C. Configuration (Customizing SLURM Templates)

The workflow uses SLURM job scripts generated from templates. To customize these for your HPC environment (partition names, memory limits, modules):

  1. Initialize local templates:

    subtimizer init-templates
    

    This creates a subtimizer_templates/ directory in your current folder with copies of all default scripts.

  2. Edit the templates: Open the files in subtimizer_templates/ (e.g., fold_template.sh) and modify the #SBATCH directives or module load commands.

  3. Run Subtimizer: The tool will automatically detect and use your local templates instead of the package defaults.

D. Prerequisites

  1. Install Anaconda or Miniconda or Mamba
  2. Install ColabFold

Add ColabFold to PATH using export PATH="/PathTo/colabfold/localcolabfold/colabfold-conda/bin:$PATH"

  1. Install ProteinMPNN

Add ProteinMPNN to PATH using export MPNN_PATH="/PathTo/ProteinMPNN/"

  1. Get the code for af2_initial_guess

Add code to PATH using export DL_BINDER_DESIGN_PATH="/PathTo/dl_binder_design/af2_initial_guess/predict.py"

  1. SLURM: This workflow is optimized for HPC environments using SLURM for job scheduling.

E. Installation

1. Set Up a Conda/Mamba Environment

Create a environment named subtimizer_env with Python>=3.9:

# Create the environment
mamba create -n subtimizer_env python=3.9 -y

# Activate the environment
mamba activate subtimizer_env

Step C: Set Up Worker Environments (Critical)

AlphaFold and ProteinMPNN run in separate environments to avoid dependency conflicts. Create these environments using the provided YAML files in the repository root.

mamba env create -f af2_des_env.yaml

mamba env create -f mpnn_des_env.yaml

2. Install Subtimizer

While in the subtimizer_env environment, you can install the package via PyPI (recommended) or from source.

Option A: Install from PyPI (Recommended)

pip install subtimizer

Option B: Install from Source (For Development) Use this if you want to modify the code or templates.

git clone https://github.com/abeebyekeen/subtimizer.git
cd subtimizer
pip install -e .
  1. Verify Installation:

    subtimizer --help
    

F. Usage

The workflow is managed through the subtimizer command. Use subtimizer --help to see all available commands.

Common Command Line Options

Most subtimizer commands (fold, design, validate, fix-pdb) accept the following options to control execution:

  • -n, --max-jobs <int>: Controls concurrency.
    • Default is 4. Increase this if you have more resources/GPUs available (e.g., -n 8).
    • Note: In parallel mode, this should match your SLURM script's layout.
  • --start <int> / --end <int>: Process a subset of the list.
    • Example: --start 1 --end 10 (Processes items 1 through 10 in your input list).

1. Setup Project Structure

Initialize the directory structure for your kinase complexes.

Input: A file (e.g., example_list_of_complexes.dat) containing the list of folder names/complexes.

AKT1_2akt1tide
ALK_axltide
SGK1_1akt1tide
TEC_srctide

Change into your working directory:

cd examples

And create this file using:

echo -e "AKT1_2akt1tide\nALK_axltide\nSGK1_1akt1tide\nTEC_srctide" > example_list_of_complexes.dat

Command:

subtimizer setup --file example_list_of_complexes.dat --type initial

This creates the project directories and necessary subfolders for AlphaFold.

2. Run AlphaFold-Multimer

Launch AlphaFold-Multimer for the listed complexes.

Important: This step expects a FASTA file (e.g., AKT1_2akt1tide.fasta) to exist inside each complex folder. See the examples folder for an example of how to prepare the FASTA files.

Option A: Batch Mode (Default)

Submits individual jobs for each complex using fold_template.sh.

subtimizer fold --file example_list_of_complexes.dat --max-jobs 4

Option B: Parallel Mode (Multi-GPU)

Submits a single job (run_fold_parallel.sh) that manages a pool of parallel tasks on a multi-GPU node.

subtimizer fold --file example_list_of_complexes.dat --mode parallel --max-jobs 4
  • --max-jobs: Number of parallel tasks (should match the number of GPUs requested in fold_parallel_template.sh).
  • --start / --end: Optionally specify explicit range of complexes to process on the list (e.g., --start 1 --end 10).

3. Run ProteinMPNN Design

Perform sequence design on the generated structures.

3.1 Setup MPNN Design Folders and Configurations

subtimizer setup --file example_list_of_complexes.dat --type mpnn

Edit the design_config.json file created in your working directory to customize chains_to_design (default: "B") or fixed_positions (default: "4") for specific complexes.

3.2 Run ProteinMPNN Design

Option A: Batch Mode (Default)

Submits individual jobs using design_template.sh.

subtimizer design --file example_list_of_complexes.dat --max-jobs 4
Option B: Parallel Mode

Submits a single multi-sequence job using design_parallel_template.sh.

subtimizer design --file example_list_of_complexes.dat --mode parallel --max-jobs 4 --start 1 --end 10

4. Analyze Design Results

Analyze sequence recovery.

Command: Generates:

  • Combined FASTA files (all_design.fa)
  • Sequence Logos (*_seqlogo.png)
  • Sequence Recovery Plots (sequence_recovery_stripplot.png and .csv)
subtimizer analyze --file example_list_of_complexes.dat

5. Sequence Clustering

Cluster designed sequences to remove duplicates and generate a summary file cluster_summary.dat.

subtimizer cluster --file example_list_of_complexes.dat

6. Preparing kinase-peptide (designed) for folding

Prepare sequences for AlphaFold-Multimer folding.

subtimizer prep-fold --file example_list_of_complexes.dat

7. Fold designed sequences with AF-Multimer

Note: The version of proteinMPNN used in this work does not generate pdbs. Hence the need for post-design folding.

However, with the newer version (and LigandMPNN) which generates structures of designed sequences, this step may not be necessary.

Option A: Batch Mode (Default)

Run on a single node.

subtimizer fold --file example_list_of_complexes.dat --stage validation --max-jobs 4
Option B: Parallel Mode (Multi-GPU)

Distribute the folding of designed sequences across multiple GPUs on a single node.

subtimizer fold --file example_list_of_complexes.dat --stage validation --mode parallel --max-jobs 4

Tip for Multi-Node Parallelism: To scale up to multiple nodes (e.g., 4 nodes), launch the parallel command 4 times with different ranges:

  1. subtimizer fold ... --start 1 --end 2 (Node 1)
  2. subtimizer fold ... --start 3 --end 4 (Node 2) ... and so on. Note: This requires manually creating different SLURM jobs or running from different interactive sessions.

8. Prepare PDBs for AF2 initial guess

Note: af2_init guess has two requirements for the input pdb * the binder (substrate) has to be the first chain * no overlapping residue numbers between chains

subtimizer fix-pdb --file example_list_of_complexes.dat

9. Validation (AF2 Initial Guess)

Run AlphaFold-based validation with initial guess.

Configuration: This step requires the path to the af2_initial_guess code (specifically predict.py). You can provide this path via the --binder-path argument or the DL_BINDER_DESIGN_PATH environment variable.

Setting the Environment Variable:

export DL_BINDER_DESIGN_PATH="/path/to/dl_binder_design/af2_initial_guess/predict.py"
subtimizer validate --file example_list_of_complexes.dat --binder-path /path/to/dl_binder_design/af2_initial_guess/predict.py

10. Reporting

Generates final reports, including:

  • Merged score CSVs with weighted pTM_ipTM metric (0.2*pTM + 0.8*ipTM).
  • Swarm plots of validation metrics.
  • Data is copied to af2_init_guess/data/ for easy access.
subtimizer report --file example_list_of_complexes.dat

11. Workflow for Original (Parental) Substrates

To process the parental substrates (Legacy Steps 16-17), use the setup --type original command with the standard workflow tools.

  1. Setup: Creates original_subs folder and prepares files.

    subtimizer setup --file example_list_of_complexes.dat --type original
    
  2. Process: Run commands pointing to the new files.

    cd original_subs
    # Fix PDBs
    subtimizer fix-pdb --file ../example_list_of_complexes.dat
    
    # Validation
    subtimizer validate --file ../example_list_of_complexes.dat --max-jobs 4
    
    # Reporting (Generates 'original' data)
    subtimizer report --file ../example_list_of_complexes.dat
    
  3. Final Merge: Return to the main directory and run/re-run report to combine results.

    cd ..
    # This automatically detects 'original_subs' and merges the data
    subtimizer report --file example_list_of_complexes.dat
    

12. ipSAE Evaluation

Perform interface-based Structure-Activity Relationship (ipSAE) analysis on the folded structures.

  1. Prerequisite: Download ipSAE and add it to your PATH:

    export PATH=$PATH:/path/to/ipSAE_directory
    
  2. Run ipSAE Calculation: This command submits a SLURM job to calculate ipSAE metrics for all structures (designed and parental).

    subtimizer ipsae --file example_list_of_complexes.dat --max-jobs 16
    
    • Supports list slicing: subtimizer ipsae --file list.dat --start 1 --end 5
    • Arguments: --pae-cutoff (default: 15), --dist-cutoff (default: 15).
  3. Generate Final Reports: Run the report command again to generate ipSAE-specific plots (Regression and Colored Scatter plots).

    subtimizer report --file example_list_of_complexes.dat
    

G. Citation

If you use Subtimizer in your work, please cite:

Yekeen A.A., Meyer C.J., McCoy M., Posner B., Westover K.D. A Computational Workflow for Structure-Guided Design of Potent and Selective Kinase Peptide Substrates. bioRxiv (2025). https://doi.org/10.1101/2025.07.04.663216

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtimizer-1.0.0.tar.gz (130.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subtimizer-1.0.0-py3-none-any.whl (45.5 kB view details)

Uploaded Python 3

File details

Details for the file subtimizer-1.0.0.tar.gz.

File metadata

  • Download URL: subtimizer-1.0.0.tar.gz
  • Upload date:
  • Size: 130.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.18

File hashes

Hashes for subtimizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b5f634aa40c60a4ce9dd66440dd37f10d3c5dfcfc67658754801c78d2976635e
MD5 0615722dfde3056020287cb12b35efcb
BLAKE2b-256 1ea9194161d68afa44fef090df2866b542442c21c14b4ff71c811a75b6d6d4fa

See more details on using hashes here.

File details

Details for the file subtimizer-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: subtimizer-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 45.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.18

File hashes

Hashes for subtimizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52cb3c4aaf3807fdbf42c1edecb909712d055ff2951a9baebd762f29f448217f
MD5 ab7c87bd1d3dfa7a53033a8fc1593807
BLAKE2b-256 804287b9652716b5ec00280a3b62a30102698f026862c951cd3392325d88a61a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page