Deep mutational scanning tool for protein-protein binding affinity prediction
Project description
KdPred
Overview
KdPred is an automated pipeline for deep mutational scanning and predicting protein-protein binding affinities (K_d) using structure prediction (ColabFold) and binding affinity prediction (Prodigy).
Features
- Efficient mutation sequence generation: Create mutated protein sequences from mutation specifications
- Structure prediction: Automated structure prediction using ColabFold
- Kd prediction: Binding affinity prediction using Prodigy
- Flexible mutation format: Supports single mutations, multiple mutations, and saturation mutagenesis
- Modular design: Each step can be run independently or as part of a complete pipeline
Installation
Prerequisites
Note about PATH and shells: If colabfold_batch is available in your interactive shell (for example after conda activate) but Python reports it as not found when running the pipeline, the issue is usually that PATH modifications live in shell init files and are not present in the Python process environment. Solutions:
- Provide the full executable path to ColabFold, for example
--colabfold-cmd /home/you/colabfold/bin/colabfold_batchorColabFoldPredictor(colabfold_command='/full/path/colabfold_batch'). - Launch the script/notebook from the same shell where you activated the environment (e.g., run Python after
conda activate kd_py312). - Export the ColabFold bin directory into the environment that will run Python, for example:
export PATH="/Your_ColabFold_Location/colabfold-conda/bin:$PATH"
Install KdPred
# activate your virtual environment
conda activate kd_py312
# Install from source after downloading/cloning the repository
pip install -e .
# Or install dependencies only
pip install kdpred
Usage
Basic Usage
Run the complete deep mutational scanning pipeline:
# suppose your virtual environment is named kd_py312
conda activate kd_py312
# navigate to the directory containing protein.txt and mutations.txt
kdtool \
--protein-seq-fpath protein.txt \
--mutation-config-fpath mutations.txt \
--output-dir /full_path/results/ \
--protein-name DEMO_PROTEIN \
--colabfold-cmd colabfold_batch \
--job-type gp_multiple
where the job type can be one of: prodigy, gp_single, or gp_multiple. The prodigy option uses only Prodigy for Kd prediction on provided structures, while gp_single and gp_multiple use ColabFold for structure prediction followed by Gaussian Process regression models for Kd predictions with prodigy features.
To get to know all available options, run:
kdtool --help
Input Files
Protein Sequences (--protein-seq-fpath)
A text file with one protein sequence per line, one per chain:
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSMLLSSQESVQGDWLDSLLAQ
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSMLLSSQESVQGDWLDSLLAQ
Mutations (--mutation-config-fpath)
A text file with one mutation per line. Supports multiple formats:
Single mutation:
B.H.68.F
Multiple mutations (comma-separated):
B.H.68.F,A.K.42.R
Saturation mutation (3 parts, will be expanded):
B.H.68
Comments (lines starting with # are ignored):
# Single point mutation
B.H.68.F
# Saturation mutagenesis at position 68
B.H.68
Mutation Format
Mutations follow the format: Chain.Wildtype.Position.Mutant
- Chain: Single uppercase letter (A, B, C, etc.)
- Wildtype: Single letter amino acid code
- Position: 1-based position in the sequence
- Mutant: Single letter amino acid code
Example: B.H.68.F means on chain B, replace Histidine (H) at position 68 with Phenylalanine (F).
Advanced Usage
Custom residue list for saturation
kdtool scan \
--protein-seq-fpath /full/path/protein.txt \
--mutation-config-fpath /full/path/mutations.txt \
--output-dir /full/path/results/ \
--residue-list "A,C,D,E,F"
where "A, C, D, E, F" are the amino acids to use for saturation mutagenesis.
Custom ColabFold/Prodigy settings
kdtool scan \
--protein-seq-fpath /full/path/protein.txt \
--mutation-config-fpath /full/path/mutations.txt \
--output-dir /full/path/results/ \
--colabfold-cmd /full/path/colabfold_batch \
--job-type gp_multiple \
--num-recycles 3 \
--num-models 5
Programmatic Usage
You can also use KdPred as a Python library. The recommended entry point for the full pipeline is deep_mutational_scanning_pipeline in kdpred.cli:
from pathlib import Path
from kdpred.cli import deep_mutational_scanning_pipeline as dms
df_results = dms(
protein_seq_fpath=Path("protein.txt"),
mutation_config_fpath=Path("mutations.txt"),
output_dir=Path("/full/path/results"),
protein_name="MyProtein",
colabfold_cmd="colabfold_batch", # or full path to colabfold_batch
job_type="gp_multiple", # "prodigy", "gp_single", or "gp_multiple"
)
print(df_results.head())
Module Structure
kdpred.mutations: Efficient mutation sequence generationkdpred.structure: ColabFold structure predictionkdpred.kd_prediction: Prodigy Kd predictionkdpred.utils: Utility functions for validation and file I/Okdpred.cli: Command-line interface
Citation
If you use KdPred in your research, please cite:
@article{your2025kdpred,
title={Paper Title Here},
author={Name and Collaborators},
journal={Journal Name},
year={2026},
publisher={Publisher}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kdpred-0.0.2.tar.gz.
File metadata
- Download URL: kdpred-0.0.2.tar.gz
- Upload date:
- Size: 74.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b0756fb37ec0d8dab684ca9acf0d3ee3c7856e002ac64f77ce30a66fd176182
|
|
| MD5 |
cb54270f4477ec74a27c4d31623c7d0a
|
|
| BLAKE2b-256 |
346d9bf78f6c868c1b8bca1a71b3dc83a826d1bdb7bbdd9a8b1bf676b17a60b9
|
Provenance
The following attestation bundles were made for kdpred-0.0.2.tar.gz:
Publisher:
pypi_release.yaml on FanwangM/KdPred
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kdpred-0.0.2.tar.gz -
Subject digest:
0b0756fb37ec0d8dab684ca9acf0d3ee3c7856e002ac64f77ce30a66fd176182 - Sigstore transparency entry: 726201032
- Sigstore integration time:
-
Permalink:
FanwangM/KdPred@4a1241c4dd4e38f66a9505a4658411ef5400c324 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/FanwangM
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_release.yaml@4a1241c4dd4e38f66a9505a4658411ef5400c324 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kdpred-0.0.2-py3-none-any.whl.
File metadata
- Download URL: kdpred-0.0.2-py3-none-any.whl
- Upload date:
- Size: 64.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
679551874a552c24e5f76d1d0ab1f48f1a0b0316560a1a48b065d0cf46b2d41d
|
|
| MD5 |
dd319f61aaa300330b7798712196df77
|
|
| BLAKE2b-256 |
19db542cadab8234e41279a4f7a8b1b34687700bb07dcca7d30bb4634c6f199d
|
Provenance
The following attestation bundles were made for kdpred-0.0.2-py3-none-any.whl:
Publisher:
pypi_release.yaml on FanwangM/KdPred
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kdpred-0.0.2-py3-none-any.whl -
Subject digest:
679551874a552c24e5f76d1d0ab1f48f1a0b0316560a1a48b065d0cf46b2d41d - Sigstore transparency entry: 726201043
- Sigstore integration time:
-
Permalink:
FanwangM/KdPred@4a1241c4dd4e38f66a9505a4658411ef5400c324 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/FanwangM
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_release.yaml@4a1241c4dd4e38f66a9505a4658411ef5400c324 -
Trigger Event:
push
-
Statement type: