Deep mutational scanning tool for protein-protein binding affinity prediction

These details have not been verified by PyPI

Project description

KdPred

Overview

KdPred is an automated pipeline for deep mutational scanning and predicting protein-protein binding affinities (K_d) using structure prediction (ColabFold) and binding affinity prediction (Prodigy).

Features

Efficient mutation sequence generation: Create mutated protein sequences from mutation specifications
Structure prediction: Automated structure prediction using ColabFold
Kd prediction: Binding affinity prediction using Prodigy
Flexible mutation format: Supports single mutations, multiple mutations, and saturation mutagenesis
Modular design: Each step can be run independently or as part of a complete pipeline

Installation

Prerequisites

Python 3.12 or higher
ColabFold installed and available in PATH
Prodigy installed.

Note about PATH and shells: If colabfold_batch is available in your interactive shell (for example after conda activate) but Python reports it as not found when running the pipeline, the issue is usually that PATH modifications live in shell init files and are not present in the Python process environment. Solutions:

Provide the full executable path to ColabFold, for example --colabfold-cmd /home/you/colabfold/bin/colabfold_batch or ColabFoldPredictor(colabfold_command='/full/path/colabfold_batch').
Launch the script/notebook from the same shell where you activated the environment (e.g., run Python after conda activate kd_py312).
Export the ColabFold bin directory into the environment that will run Python, for example:

export PATH="/Your_ColabFold_Location/colabfold-conda/bin:$PATH"

Install KdPred

# activate your virtual environment
conda activate kd_py312
# Install from source after downloading/cloning the repository
pip install -e .

# Or install dependencies only
pip install kdpred

Usage

Basic Usage

Run the complete deep mutational scanning pipeline:

# suppose your virtual environment is named kd_py312
conda activate kd_py312
# navigate to the directory containing protein.txt and mutations.txt

kdtool \
    --protein-seq-fpath protein.txt \
    --mutation-config-fpath mutations.txt \
    --output-dir /full_path/results/ \
    --protein-name DEMO_PROTEIN \
    --colabfold-cmd colabfold_batch \
    --job-type gp_multiple

where the job type can be one of: prodigy, gp_single, or gp_multiple. The prodigy option uses only Prodigy for Kd prediction on provided structures, while gp_single and gp_multiple use ColabFold for structure prediction followed by Gaussian Process regression models for Kd predictions with prodigy features.

To get to know all available options, run:

kdtool --help

Input Files

Protein Sequences (`--protein-seq-fpath`)

A text file with one protein sequence per line, one per chain:

MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSMLLSSQESVQGDWLDSLLAQ
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSMLLSSQESVQGDWLDSLLAQ

Mutations (`--mutation-config-fpath`)

A text file with one mutation per line. Supports multiple formats:

Single mutation:

B.H.68.F

Multiple mutations (comma-separated):

B.H.68.F,A.K.42.R

Saturation mutation (3 parts, will be expanded):

B.H.68

Comments (lines starting with # are ignored):

# Single point mutation
B.H.68.F
# Saturation mutagenesis at position 68
B.H.68

Mutation Format

Mutations follow the format: Chain.Wildtype.Position.Mutant

Chain: Single uppercase letter (A, B, C, etc.)
Wildtype: Single letter amino acid code
Position: 1-based position in the sequence
Mutant: Single letter amino acid code

Example: B.H.68.F means on chain B, replace Histidine (H) at position 68 with Phenylalanine (F).

Advanced Usage

Custom residue list for saturation

kdtool scan \
    --protein-seq-fpath /full/path/protein.txt \
    --mutation-config-fpath /full/path/mutations.txt \
    --output-dir /full/path/results/ \
    --residue-list "A,C,D,E,F"

where "A, C, D, E, F" are the amino acids to use for saturation mutagenesis.

Custom ColabFold/Prodigy settings

kdtool scan \
    --protein-seq-fpath /full/path/protein.txt \
    --mutation-config-fpath /full/path/mutations.txt \
    --output-dir /full/path/results/ \
    --colabfold-cmd /full/path/colabfold_batch \
    --job-type gp_multiple \
    --num-recycles 3 \
    --num-models 5

Programmatic Usage

You can also use KdPred as a Python library. The recommended entry point for the full pipeline is deep_mutational_scanning_pipeline in kdpred.cli:

from pathlib import Path

from kdpred.cli import deep_mutational_scanning_pipeline as dms

df_results = dms(
    protein_seq_fpath=Path("protein.txt"),
    mutation_config_fpath=Path("mutations.txt"),
    output_dir=Path("/full/path/results"),
    protein_name="MyProtein",
    colabfold_cmd="colabfold_batch",  # or full path to colabfold_batch
    job_type="gp_multiple",           # "prodigy", "gp_single", or "gp_multiple"
)

print(df_results.head())

Module Structure

kdpred.mutations: Efficient mutation sequence generation
kdpred.structure: ColabFold structure prediction
kdpred.kd_prediction: Prodigy Kd prediction
kdpred.utils: Utility functions for validation and file I/O
kdpred.cli: Command-line interface

Citation

If you use KdPred in your research, please cite:

@article{your2025kdpred,
  title={Paper Title Here},
  author={Name and Collaborators},
  journal={Journal Name},
  year={2026},
  publisher={Publisher}
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.2

Nov 26, 2025

0.0.2a2 pre-release

Nov 26, 2025

0.0.2a1 pre-release

Nov 26, 2025

This version

0.0.1

Nov 26, 2025

0.0.1a3 pre-release

Nov 26, 2025

0.0.1a2 pre-release

Nov 26, 2025

0.0.1a1 pre-release

Nov 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kdpred-0.0.1.tar.gz (73.8 kB view details)

Uploaded Nov 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kdpred-0.0.1-py3-none-any.whl (63.8 kB view details)

Uploaded Nov 26, 2025 Python 3

File details

Details for the file kdpred-0.0.1.tar.gz.

File metadata

Download URL: kdpred-0.0.1.tar.gz
Upload date: Nov 26, 2025
Size: 73.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kdpred-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`eb0895c12f379c92436f8c5564b358bf01569071a587f83419b25a9768e4a8cc`
MD5	`51359858bc52e724353d1bb8d3f277da`
BLAKE2b-256	`27cc6c4030f6e3550d5e2875477254bc84800685c8c807e53e4a970615663cbe`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kdpred-0.0.1.tar.gz:

Publisher: pypi_release.yaml on FanwangM/KdPred

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kdpred-0.0.1.tar.gz
- Subject digest: eb0895c12f379c92436f8c5564b358bf01569071a587f83419b25a9768e4a8cc
- Sigstore transparency entry: 726034153
- Sigstore integration time: Nov 26, 2025
Source repository:
- Permalink: FanwangM/KdPred@03ee38239f482067559ca7fb0b540d1baa4f664f
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/FanwangM
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_release.yaml@03ee38239f482067559ca7fb0b540d1baa4f664f
- Trigger Event: push

File details

Details for the file kdpred-0.0.1-py3-none-any.whl.

File metadata

Download URL: kdpred-0.0.1-py3-none-any.whl
Upload date: Nov 26, 2025
Size: 63.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kdpred-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`12896e24be32426dee01458df500f361b018cf0b39af2fdf803f66e4bf8f819a`
MD5	`5f3bab1d542b4db6fca6dff8c8220a2c`
BLAKE2b-256	`e70f329193370aa7b442dd7eacdea1bf2cc1251fc0ef092ee3145fd3141b8a27`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kdpred-0.0.1-py3-none-any.whl:

Publisher: pypi_release.yaml on FanwangM/KdPred

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kdpred-0.0.1-py3-none-any.whl
- Subject digest: 12896e24be32426dee01458df500f361b018cf0b39af2fdf803f66e4bf8f819a
- Sigstore transparency entry: 726034170
- Sigstore integration time: Nov 26, 2025
Source repository:
- Permalink: FanwangM/KdPred@03ee38239f482067559ca7fb0b540d1baa4f664f
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/FanwangM
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_release.yaml@03ee38239f482067559ca7fb0b540d1baa4f664f
- Trigger Event: push

kdpred 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

KdPred

Overview

Features

Installation

Prerequisites

Install KdPred

Usage

Basic Usage

Input Files

Protein Sequences (--protein-seq-fpath)

Mutations (--mutation-config-fpath)

Mutation Format

Advanced Usage

Custom residue list for saturation

Custom ColabFold/Prodigy settings

Programmatic Usage

Module Structure

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Protein Sequences (`--protein-seq-fpath`)

Mutations (`--mutation-config-fpath`)