Self-correcting protein folding with differentiable NMR constraints
Project description
๐งฌ ResonanceFlow: Differentiable Protein Structure Prediction with NMR Self-Correction
ResonanceFlow is a JAX-native protein structure prediction framework that integrates differentiable biophysics with experimental NMR constraints. It allows models to "self-correct" by propagating gradients from physical violations (atomic clashes, bad geometry) and NMR observables (RDCs, NOE distances) back into the neural network architecture โ end-to-end, with no manual refinement step.
๐ Key Features
- JAX-Native Gradient Flow โ End-to-end differentiability from experimental constraints to model weights via
jax.grad. - Saupe Tensor RDC Loss โ Differentiable least-squares fitting of the alignment tensor at every forward pass (Bax & Tjandra 1997; Cornilescu et al. 1998).
- NOE Distance Restraints โ Flat-bottomed harmonic penalty on upper-bound violations, the primary 3D information source in protein NMR (Wรผthrich 1986; Gรผntert et al. 1997).
- Biophysically Correct Geometry โ Bond length loss calibrated to the canonical CฮฑโCฮฑ distance of 3.80 ร (Engh & Huber 1991).
- Differentiable Steric Clash โ Harmonic atom-overlap penalty with optional AMBER/CHARMM-style 1-2/1-3 bonded exclusions, powered by
jax-md. - RDC Quality Metric โ Built-in Q-factor (Cornilescu et al. 1998) for structural validation without additional tooling.
- PBC Support โ Periodic boundary conditions for simulation-box contexts.
- Transformer-to-Coords โ A pre-LN Transformer architecture that maps amino acid sequences directly to physical 3D Cฮฑ coordinates.
๐ง The Concept: "Self-Correction"
Traditional folding models are trained on static PDB snapshots. ResonanceFlow instead teaches a model to listen to physical laws and NMR data during training itself:
Sequence โ [Transformer] โ Cฮฑ Coordinates
โ
โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
Steric Clash Bond Length RDC / NOE
Penalty Loss Mismatch
โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โ โฮธ L_total
โผ
[Optimizer Step]
Gradients from every constraint flow back simultaneously into the model weights โ the model learns not just from data, but from physics.
๐ ๏ธ Installation
pip install resonance-flow
For development (includes linting, type-checking, testing, and docs):
git clone https://github.com/elkins/resonance-flow.git
cd resonance-flow
pip install -e ".[dev]"
Requirements: Python 3.10+, JAX โฅ 0.4, Flax, Optax, jax-md, NumPy.
๐งช Quick Start
Run the self-correction demo
from resonance_flow.train import main
state = main(num_steps=100)
# Step 0 | Total Loss: 12.3421 | Steric: 0.0012 | Bond: 1.2034 | RDC: 0.0087
# Step 10 | Total Loss: 4.1823 | ...
# Step 100 | Total Loss: 0.0031 | ...
Use individual loss functions
import jax.numpy as jnp
from resonance_flow import (
get_steric_clash_loss,
get_bond_length_loss,
rdc_loss,
rdc_q_factor,
noe_upper_bound_loss,
estimate_nh_proxy_vectors,
)
# โโ Steric clash (AMBER-style 1-2 bonded exclusion) โโโโโโโโโโโโโโโโโโโโโโโโโโ
clash_fn = get_steric_clash_loss(exclude_bonded_range=1)
positions = jnp.array([[0.0, 0.0, 0.0], [4.0, 0.0, 0.0]])
atom_radii = jnp.array([1.5, 1.5])
clash_fn(positions, atom_radii) # โ 0.0 (no overlap)
# โโ Bond length (CฮฑโCฮฑ virtual bond, Engh & Huber 1991) โโโโโโโโโโโโโโโโโโโโโ
bond_fn = get_bond_length_loss() # default target = 3.8 ร
ca_chain = jnp.array([[0.0,0.0,0.0],[3.8,0.0,0.0],[7.6,0.0,0.0]])
bond_fn(ca_chain) # โ ~0.0
# โโ RDC loss (Saupe tensor fitting) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
nh_vecs = jnp.array([[1.,0.,0.],[0.,1.,0.],[0.,0.,1.],
[0.7,0.7,0.],[0.7,0.,0.7],[0.,0.7,0.7]])
measured_rdc = jnp.array([10., -5., 2., 0., 4., 8.])
rdc_loss(nh_vecs, measured_rdc) # โ scalar MSE
# โโ RDC Q-factor (structure quality; Q โค 0.20 = high quality) โโโโโโโโโโโโโโโ
rdc_q_factor(nh_vecs, measured_rdc) # โ 0 โ 1 (lower is better)
# โโ N-H proxy vectors from Cฮฑ coordinates (Cฮฑ-only models) โโโโโโโโโโโโโโโโโโ
ca_coords = jax.random.normal(jax.random.PRNGKey(0), (10, 3))
nh_proxy = estimate_nh_proxy_vectors(ca_coords) # โ (8, 3) unit vectors
# โโ NOE upper-bound distance restraints (Wรผthrich 1986) โโโโโโโโโโโโโโโโโโโโโ
noe_pairs = jnp.array([[0, 2], [1, 3]])
upper_bounds = jnp.array([5.0, 4.5])
noe_upper_bound_loss(positions, noe_pairs[:1], upper_bounds[:1]) # โ 0.0
๐ฌ Scientific Basis
All loss functions and validation metrics are grounded in published, peer-reviewed NMR methodology:
| Loss / Metric | Scientific Basis |
|---|---|
| RDC loss โ Saupe tensor | Bax & Tjandra, J. Biomol. NMR 1997; Cornilescu et al., JACS 1998 |
| RDC Q-factor | Cornilescu et al., JACS 1998; Clore & Garrett, JACS 1999 |
| NOE distance restraints | Wรผthrich, NMR of Proteins and Nucleic Acids 1986; Gรผntert et al., J. Mol. Biol. 1997 |
| CฮฑโCฮฑ bond distance (3.8 ร ) | Engh & Huber, Acta Crystallogr. A 1991 |
| N-H proxy vectors | Zweckstetter & Bax, JACS 2000 |
| Bonded exclusion (1-2/1-3) | Cornell et al. (AMBER), JACS 1995; MacKerell et al. (CHARMM), J. Phys. Chem. B 1998 |
| d_max = 21 700 Hz | Ottiger & Bax, JACS 1998 |
๐งฌ Architecture
TransformerCoordinatePredictor
โโโ Embedding (vocab_size=21, d_model=128)
โโโ Positional Embed (learned, max_len=512)
โโโ N ร Pre-LN Block
โ โโโ LayerNorm โ MultiHeadSelfAttention โ Residual
โ โโโ LayerNorm โ FFN (4ร expand, GELU) โ Residual
โโโ LayerNorm โ Linear(3) # โ (batch, seq_len, 3) Cฮฑ coordinates
The pre-LN (LayerNorm before attention) layout avoids gradient explosion and follows the convention recommended by Xiong et al. 2020.
๐ค Contributing
Contributions are welcome! Please open an issue or pull request. The project follows:
- Formatting + Linting:
ruff/ruff format - Type checking:
mypy - Testing:
pytestwith coverage
# Run the full quality pipeline before submitting a PR
ruff check resonance_flow tests
ruff format resonance_flow tests
mypy resonance_flow tests
pytest --cov=resonance_flow tests
๐ Documentation
Full theory, API reference, and examples at elkins.github.io/resonance-flow.
โ๏ธ License
MIT ยฉ George Elkins
๐ Related Projects
ResonanceFlow is the most complete end-to-end model in this ecosystem, depending on:
- diff-biophys โ Differentiable RDC, NOE, bond-length, and clash kernels
- synth-nmr โ NMR parameter libraries (chemical shifts, Karplus, RDC)
- synth-pdb โ Protein structure data generation
- TorsionTuner โ Single-structure refinement using similar torsion-space kinematics
- diff-ensemble โ Ensemble counterpart for IDPs
๐ Citation
@software{resonance_flow,
author = {Elkins, George},
title = {ResonanceFlow: Differentiable protein structure prediction with NMR self-correction},
year = {2024},
url = {https://github.com/elkins/resonance-flow},
version = {0.1.0}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file resonance_flow-0.1.0.tar.gz.
File metadata
- Download URL: resonance_flow-0.1.0.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fe7bee6ed152c1774b47ee4bc02ae426cd6df6a8d7ce1ab2dfdc54fecd2a960
|
|
| MD5 |
a2c60fb91a65450da875c052df1866be
|
|
| BLAKE2b-256 |
7c87d67e6828adac985709af8bc44433638f6256d83279e1033765e8cdbab972
|
File details
Details for the file resonance_flow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: resonance_flow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c07b2abdd3a5012a0c2520d5f0e40464dfe5a3d9951dfb6bf9062890a36e794c
|
|
| MD5 |
6eb18892a94966fac5b52dd9e721dc51
|
|
| BLAKE2b-256 |
7a4548ca0bed6ab5e75894d81324000c0821641cd3007d2030d4f86155245340
|