dynopsi inference model for protein binder design
Project description
Dyno Psi-1
This repository enables inference and sampling for Dyno Psi-1, a de novo miniprotein binder design model. The binder design pipeline is configurable and modular, supporting binder generation against both single and multi-chain targets. Backbone atom coordinates are output for each designed binder and can be exported in multiple formats for downstream analysis. ProteinMPNN is recommended to generate accompanying sequences, and recommended settings are provided below in the sequence design section.
Dyno Psi model checkpoints and supplementary datasets are hosted on Hugging Face
.
Installation
Requirements
- Python 3.10+
- CUDA-enabled GPU
Setup
pip install dynopsi
Overview
A binder design run is defined by a configuration, specified as a YAML file or directly in Python. The example below demonstrates both options, and a more detailed explanation of possible configurations can be found in example/README.md. The configuration contains sections for defining featurization parameters (e.g. binder lengths, target crops, hotspot residues), simulation parameters (e.g. ODE/SDE solvers, time schedules), and output formats and file locations. The input features to the model defined by the configuration can be viewed and validated before starting sampling. Generated structure samples are the inputs to sequence design and filtering steps.
Each step is decribed in more detail in the sections below:
Configuration
The design pipeline consists of three separately configurable components, which you can choose to configure as a YAML file or directly in Python (which provides more flexibility). These are (1) Featurization, (2) Simulation, and (3) Output Configuration.
Featurization
This is where you provide a target, optionally crop your target, specify hotspots, etc. For the target, you can provide either an RCSB record ID, an AFDB ID, or a local path to a structure file. For the first two options, the structure will be downloaded. For binder chains to be designed, you must specify the desired length (or multiple lengths) as well as an estimated center of mass of the final design. Our recommendation is to load your structure in a visualization tool like PyMOL and manually position the binder relative to the target.
Required inputs to featurization
name: A user-readable name that defines this featurization setup.- Define your target structure with exactly one of the following:
rcsb_record_id: The RCSB ID of your target; it will be downloaded from the RCSB PDB. This requires an internet connection.afdb_uniprot_id: The Uniprot ID of the target; it will be downloaded from AFDB (v6). All isoforms are allowed (e.g. P16871, P16871-2). This requires an internet connection.structure_path: Local path to the target structure, in .pdb or .mmcif format.
- Define the chain ID / indexing strategy used for the featurization specification
-
index_type: Which index strategy is used. Options areoriginal_residue_index,auth_residue_index, described below:auth_residue_index: Use the Author-defined indexing: residue indices and chain IDs are defined by the author of the deposited structure file.original_residue_index: Use standard PDB indexing: all residue indices start at 1 within each chain and the chain ID is the standard PDB chain ID.
We recommend loading your structure in Mol*Viewer to select chains and indices. Mol*Viewer explicitly annotates with "AUTH" in scenarios where the author-defined chain ID and residue indexing diverges from standard PDB indexing.
It is strongly recommended that you check the output featurization before generating designs to make sure that you have defined the correct chain/index parameters, especially if you use
auth_residue_index.
-
Four additional featurization arguments are required: crops, hotspots, new_chains_lengths, and new_chains_centers_of_mass. Each accepts multiple configurations, which are combined in a matrix, producing one featurization for every combination across all four arguments. Separate simulations will be run sequentially for each combination specified.
crops: A dictionary of {crop_name : crop_range}.hotspots: A dictionary of {hotspot_name : hotspot_positions}.new_chains_lengths: A list of different chain lengths to generate.new_chains_centers_of_mass: A dictionary of {center_name : center_of_mass_function}.
An example
The same binder design task can be specified using either YAML or Python.
YAML version:
featurization_pipelines:
il7ra_3di3_binder_design: # name this something unique
type: dynopsi_binder_design_featurization_pipeline # this is the only option currently supported
index_type: auth_residue_index # `auth_residue_index` or `original_residue_index`
rcsb_record_id: 3di3
# afdb_uniprot_id: P16871
# structure_path: /path/to/pdb_or_cif/file
crops:
crop_17_209: B17-209 # name crop_17_209 can be whatever you want
hotspots:
hotspot_58V_80L_139Y: B58,B80,B139 # name hotspot_58V_80L_139Y can be whatever you want
new_chains_lengths: [60, 80, 100, 120] # produce binders of length 60, 80, 100, 120
new_chains_centers_of_mass:
il7center: # name il7center can be whatever you want
type: predefined_center_of_mass
centers_of_mass: [[28.1, 41.4, 47.9]]
Python version:
from dynopsi.data.featurization import DynoPsiBinderDesignFeaturizationPipeline, primitives
featurization_pipeline = DynoPsiBinderDesignFeaturizationPipeline(
name="il7ra_3di3_binder_design",
index_type="auth_residue_index",
rcsb_record_id="3di3",
crops={"crop_17_209": "B17-209"},
hotspots={"hotspot_58V_80L_139Y": "B58,B80,B139"},
new_chains_lengths=[60, 80, 100, 120],
new_chains_centers_of_mass={
"il7center": primitives.PredefinedCenterOfMassEstimator(centers_of_mass=[[28.1, 41.4, 47.9]])
},
)
Simulation
The simulation configuration controls the sampler, including which solver to use (ODE/SDE) and its parameters. Beyond the default solvers, sampling behavior can be customized by composing Ops — instructions that define the denoising trajectory — into a Simulation procedure. This makes it straightforward to introduce guidance (coming soon!), impose symmetry, or otherwise modify the sampling dynamics. For vanilla miniprotein binder design, we recommend the following defaults:
YAML version:
simulation_pipelines:
sde_100_steps_default: # name this whatever you want
type: sde_simulation # supported types include `ode_simulation` and `sde_simulation`
time_sampler:
type: linear_time_sampler # supported types include `linear_time_sampler` and `log_time_sampler`
num_steps: 100 # sample quality generally increases with more steps, but we see saturation ~100
diffusion_coefficient_fn:
type: inverse_parameter
eps: 0.02
clamp_max: 10.0
noise_scale_fn:
type: constant
constant: 0.1
score_scale_fn:
type: constant
constant: 1.5
t_thresh_score_weighting_only: 0.9
Python version:
from dynopsi.simulation import LinearTimeSampler, SDESimulation
from dynopsi.utils import ConstantFunction, InverseParameterFunction
simulation = SDESimulation(
name="SDE_linear_100_steps",
inference_model_config=DynoPsiModelConfig(
repo_id="dynotx/dynopsi", filename="dynopsi-1.ckpt",
),
diffusion_coefficient_fn=InverseParameterFunction(eps=0.1),
noise_scale_fn=ConstantFunction(constant=0.1),
score_scale_fn=ConstantFunction(constant=1.5),
t_thresh_score_weighting_only=0.9,
time_sampler=LinearTimeSampler(num_steps=100),
)
Output Configuration
Specify the output format (.pdb or .cif), output directory, and whether to save the full sampling trajectory.
Note: Saving trajectories requires significantly more time and disk space and is not recommended for large-scale runs.
YAML version:
output_specifications:
default_output_specification:
output_dir: /path/to/outputs
formats: [pdb, npz]
save_trajectory: False
Python version:
from dynopsi import OutputSpecification
output_specification = OutputSpecification(
output_dir="./example/output",
formats=["pdb", "npz"],
save_trajectory=False,
)
Putting it all together
Combine the pieces you specified above into a design pipeline.
YAML version:
design_pipelines:
il7ra_3di3_binder_design:
type: dynopsi_binder_design_pipeline
featurization_pipeline: il7ra_3di3_binder_design
simulation_pipeline: sde_100_steps_default
output_specification: default_output_specification
Python version:
from dynopsi import DynoPsiBinderDesignPipeline
design_pipeline = DynoPsiBinderDesignPipeline(
name="dynopsi_binder_design_pipeline",
inference_featurization_pipeline=featurization_pipeline,
simulation_pipeline=simulation,
output_specification=output_specification
)
Validate & Run a Design Pipeline
The dynopsi CLI command only supports calls to .yaml configurations, with examples shown below. If you prefer to run design pipelines via Python scripts, refer to the .ipynb examples in example/notebooks/ for a starting point to define your own script.
Check design pipeline
Dry run the design specification and verify that the solver configuration is valid. It is strongly reccommended to run this validation and inspect your featurization, especially before long sampling runs.
Optional Arguments:
output_format: File format for the output structure. Accepted values:"pdb","mmcif".overwrite: Boolean of whether to overwrite samples. If False, checks for the presence of /path/to/outputs/<design_pipeline_name>/verify_configuration. If True, clears the entire directory prior to verifying the configuration.
CLI + YAML version:
dynopsi check example/configurations/il7ra_3di3_config.yaml
Python version:
design_pipeline.verify_configuration()
View the featurization configuration (path below) in a structure viewer. Color by b-factor; this will show
| B Factor | Hotspot Feature |
|---|---|
| 100 | hotspot |
| 50 | binder (center of mass) |
| 0 | other target residues |
/path/to/outputs/
└── <design_pipeline_name>/
└── verify_configuration/
└── <reference_structure_info>___<crop_info>___<hotspot_info>___length<length>_<center_of_mass_name>.pdb
Run design pipeline
Required arguments
num_samples: Each matrix-product of featurization that you defined will generatenum_samplesdesigns.
Optional arguments:
overwrite: Boolean of whether to overwrite samples. If False, checks for the presence of /path/to/outputs/<design_pipeline_name>/<simulation_name>. If True, clears the entire directory prior to sampling.batch_size: Size of batch for inference; if not provided, batch size is inferred from the available CUDA-enabled GPUs.
CLI + YAML version:
dynopsi sample example/configurations/il7ra_3di3_config.yaml --num_samples 10
Python version:
design_pipeline.sample(num_samples=10)
Inspect outputs
Outputs will be saved with the following pattern:
/path/to/outputs/
└── <design_pipeline_name>/
└── <simulation_name>/
└── <featurization_pipeline_name>/
└── <reference_structure_info>___<crop_info>___<hotspot_info>___length<length>_<center_of_mass_name>/
└── X_T_final_state_<sample_idx>.<output_format>
└── X_T_trajectory_<sample_idx>.<output_format>
Sequence design with ProteinMPNN
Dyno Psi-1 generates backbone (N, CA, C, O) coordinates for the binder, and preserves the original target sidechains. We recommend designing the binder sequence with target-aware ProteinMPNN.
python protein_mpnn_run.py \
--model_name v_48_020 \
--sampling_temp 0.0001 \ # low sampling temp leads to higher designability, but lower diversity
--backbone_noise 0.0 \
--omit_AAs C \
--num_seqs_per_target 4 \
--use_soluble_model \
--pdb_path_chains <binder_chain_id> \
--fixed_positions_jsonl \ # path to JSON representing dict with {chain_id: list[fixed_residues]}. We recommend including the entire target.
--pdb_path <dyno_psi_design.pdb> \
--out_folder <...>
Filter designs
We strongly recommend that you refold designs with AlphaFold2 (AF2) and filter on binder and interface quality metrics. The white paper shows filtering results from refolding with AF2 monomer in initial guess mode with a target template provided and 3 recycles. The designed structures are compared to the refolded structures, and designs that satify the following constraints are retained.
- Binder RMSD (designed complex vs. refolded complex) < 1 Angstrom
- Binder pLDDT > 0.8
- Inter-chain pAE (ipAE) < 10
The filter thresholds match the binder design in silico benchmarking thresholds from Watson, Juergens, Bennett (2023).
We show the specific code for refolding using ColabDesign below. We use the Kabsch alignment algorithm to align the designed and refolded binder to compute the Binder RMSD.
complex_prediction_model = mk_afdesign_model(
protocol="binder",
num_recycles=3,
data_dir="/path/to/weights”,
use_multimer=False,
use_initial_guess=True,
use_initial_atom_pos=False,
)
complex_prediction_model.prep_inputs(
pdb_filename=<dyno_psi_design.pdb>,
chain=<target_chain_id>,
binder_chain=<binder_chain_id>,
binder_len=<length_binder_sequence>,
use_binder_template=False,
rm_target_seq=False,
rm_target_sc=False,
rm_template_ic=True,
)
models = ["model_1_ptm"]
complex_prediction_model.predict(
seq=<binder_sequence>,
models=models,
num_models=len(models),
sample_models=False,
num_recycles=3,
verbose=False,
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dynopsi-0.0.2rc1.tar.gz.
File metadata
- Download URL: dynopsi-0.0.2rc1.tar.gz
- Upload date:
- Size: 145.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
806ee582656c0fdcd0ac845c57264d9d52b1ebe68d7085e73e06dbfefac04188
|
|
| MD5 |
b48a31103bf3940082f104fef361d7e6
|
|
| BLAKE2b-256 |
cf4cfa7893756c3dd2fd331458a10bc1193617b1d435002463aaf58ec118a238
|
Provenance
The following attestation bundles were made for dynopsi-0.0.2rc1.tar.gz:
Publisher:
publish.yml on dynotx/dynopsi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dynopsi-0.0.2rc1.tar.gz -
Subject digest:
806ee582656c0fdcd0ac845c57264d9d52b1ebe68d7085e73e06dbfefac04188 - Sigstore transparency entry: 1124760416
- Sigstore integration time:
-
Permalink:
dynotx/dynopsi@92248918e58a908a08860c348ef26afeddd810b8 -
Branch / Tag:
refs/tags/v0.0.2rc1 - Owner: https://github.com/dynotx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@92248918e58a908a08860c348ef26afeddd810b8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dynopsi-0.0.2rc1-py3-none-any.whl.
File metadata
- Download URL: dynopsi-0.0.2rc1-py3-none-any.whl
- Upload date:
- Size: 179.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7559a7d721b961d1056ca825d767dcc9c6e4492539dd83c208dbed39bd628281
|
|
| MD5 |
7b509e0e1d8b8302411d4b3f25b9bce6
|
|
| BLAKE2b-256 |
0b7be7813a3a061caab0c52b4b16ffa6148d9ed69bbaee42d4007eb44f30b876
|
Provenance
The following attestation bundles were made for dynopsi-0.0.2rc1-py3-none-any.whl:
Publisher:
publish.yml on dynotx/dynopsi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dynopsi-0.0.2rc1-py3-none-any.whl -
Subject digest:
7559a7d721b961d1056ca825d767dcc9c6e4492539dd83c208dbed39bd628281 - Sigstore transparency entry: 1124760562
- Sigstore integration time:
-
Permalink:
dynotx/dynopsi@92248918e58a908a08860c348ef26afeddd810b8 -
Branch / Tag:
refs/tags/v0.0.2rc1 - Owner: https://github.com/dynotx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@92248918e58a908a08860c348ef26afeddd810b8 -
Trigger Event:
release
-
Statement type: