Golden Pendulum MTL: Anti-resonance equilibria for gradient balancing in multi-task learning
Project description
Golden Pendulum MTL
Anti-resonance equilibria for gradient balancing in multi-task learning.
Replace Nash-MTL corner solutions with golden-ratio (φ) weights that prevent harmonic lock-in between competing task gradients.
Standard Nash-MTL allocated 89% of gradient bandwidth to one task while starving others at 3.7%. Golden Pendulum achieves wmin/wmax = 0.24 while maintaining Pareto optimality.
The Problem
Multi-task gradient methods (MGDA, Nash-MTL, PCGrad) suffer from corner solutions: when task losses have disparate magnitudes, the optimizer converges to simplex vertices where one task dominates.
| Method | wmin/wmax | Max Weight | Corner? |
|---|---|---|---|
| Equal Weights | 1.00 | 0.25 | No (but ignores conflicts) |
| Nash-MTL | 0.04 | 0.89 | Yes |
| GradNorm | ~0.10 | ~0.60 | Sometimes |
| Golden Pendulum | 0.24 | 0.45 | No |
The Solution
Golden Pendulum MTL derives from the physics of coupled wave oscillators that the stable equilibrium lies at golden-ratio-spaced points (φ = (1+√5)/2 ≈ 1.618), not at simplex corners. These weights are maximally incommensurate — no pair has a rational ratio — preventing the harmonic resonances that cause lock-in.
Algorithm (3 lines to integrate):
from golden_pendulum import GoldenPendulumMTL
balancer = GoldenPendulumMTL(n_tasks=4, lam=0.5)
# In your training loop (replaces loss.backward()):
weights = balancer.backward(losses, model)
optimizer.step()
Installation
pip install golden-pendulum-mtl
Or from source:
git clone https://github.com/Zynerji/GoldenPendulumMTL.git
cd GoldenPendulumMTL
pip install -e ".[dev]"
Quick Start
import torch
import torch.nn as nn
from golden_pendulum import GoldenPendulumMTL
# Your multi-task model
model = YourMultiHeadModel()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
balancer = GoldenPendulumMTL(n_tasks=3, lam=0.5)
for batch in dataloader:
optimizer.zero_grad()
# Compute per-task losses (can have 300x magnitude disparity!)
losses = {
"ranking": ranking_loss, # ~200
"classification": cls_loss, # ~0.69
"regression": reg_loss, # ~0.01
}
# Golden Pendulum backward (replaces loss.backward())
weights = balancer.backward(losses, model)
optimizer.step()
# Monitor balance (should be >0.20, not 0.04 like Nash-MTL)
print(f"Balance: {balancer.weight_balance_ratio:.3f}")
How It Works
Algorithm 1: Golden Pendulum MTL
- Compute per-task gradients gk = ∇θ Lk
- Normalize each gradient: ĝk = gk / ||gk||2 (removes magnitude disparity)
- Compute scale-free Gram matrix: ĜTĜ
- Solve golden-ratio QP (25 iterations of projected gradient descent):
where alphagolden = [φk-1 / Σφj-1] are golden-ratio target weightsmin alpha^T G_hat^T G_hat alpha + lambda * ||alpha - alpha_golden||_1 - PCGrad conflict resolution on normalized gradients
- Set gradient = weighted sum of conflict-resolved gradients
Why Golden Ratio?
The golden ratio φ is the most irrational number: its continued fraction [1;1,1,1,...] converges slower than any other. This means:
- No pair of φ-spaced weights has a rational ratio
- Gradient updates are quasiperiodic (not periodic)
- No task can "pump" energy from others through resonance
This is the same reason φ appears in phyllotaxis (sunflower seeds), quasicrystals, and the KAM theorem for orbital stability.
Golden-Ratio Weights for K Tasks
| K | Weights | wmin/wmax |
|---|---|---|
| 2 | (0.382, 0.618) | 0.618 |
| 3 | (0.186, 0.302, 0.488) | 0.382 |
| 4 | (0.106, 0.171, 0.276, 0.447) | 0.237 |
| 8 | (0.019, 0.031, ..., 0.277) | 0.069 |
| 16 | (0.001, 0.001, ..., 0.172) | 0.004 |
Framework Integration
PyTorch Lightning
from golden_pendulum import GoldenPendulumCallback
class MyModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.automatic_optimization = False
self.golden = GoldenPendulumCallback(lam=0.5)
def training_step(self, batch, batch_idx):
losses = {"task_a": loss_a, "task_b": loss_b}
opt = self.optimizers()
opt.zero_grad()
weights = self.golden.on_train_batch(losses, self, batch_idx)
opt.step()
Hugging Face Transformers
from golden_pendulum import GoldenPendulumMTL
balancer = GoldenPendulumMTL(n_tasks=3, lam=0.5)
# In your custom Trainer.training_step:
weights = balancer.backward(losses, model)
Weight Logging
from golden_pendulum import GoldenPendulumMTL, WeightLogger
logger = WeightLogger(log_file="weights.jsonl", log_every=100)
balancer = GoldenPendulumMTL(n_tasks=4)
for step, batch in enumerate(loader):
weights = balancer.backward(losses, model)
logger.log(step, weights)
API Reference
GoldenPendulumMTL(n_tasks, lam, n_iter, min_weight_fraction, pcgrad)
| Parameter | Default | Description |
|---|---|---|
n_tasks |
0 | Expected number of tasks (0 = any) |
lam |
0.5 | Golden-ratio regularization strength |
n_iter |
25 | QP solver iterations |
min_weight_fraction |
0.02 | Minimum weight = fraction / K |
pcgrad |
True | Enable PCGrad conflict resolution |
Methods:
backward(losses, model)→Dict[str, float]task weightsweight_balance_ratio→ wmin/wmaxmean_weights(last_n)→ mean weights over last N stepsgolden_targets→ target golden-ratio weights
golden_nash_backward(losses, model, lam, n_iter, min_weight_fraction, pcgrad)
Functional API — same algorithm, no state tracking.
golden_ratio_weights(n_tasks)
Returns the φ-spaced target weights for K tasks.
Pro Features
Advanced capabilities for production multi-task training.
AdaptiveLambda — Auto-tune regularization
No more manual lambda tuning. Adapts based on real-time gradient conflict severity and loss magnitude disparity.
from golden_pendulum.pro import AdaptiveLambda
adaptive = AdaptiveLambda(lam_init=0.5, lam_min=0.05, lam_max=2.0)
for step, batch in enumerate(loader):
optimizer.zero_grad()
losses = compute_losses(model, batch)
weights = adaptive.backward(losses, model)
optimizer.step()
# Lambda auto-adjusts: high conflict -> stronger regularization
CurriculumScheduler — Multi-phase training
Manages Phase A/B/C/D training with automatic backbone freeze/unfreeze.
from golden_pendulum.pro import CurriculumScheduler, Phase
curriculum = CurriculumScheduler(
phases=[
Phase("A_ranking", tasks={"returns", "rank", "quality", "embed"},
steps=15000, freeze_backbone=False, lam=0.5, lr=5e-5),
Phase("B_risk", tasks={"vol", "mae", "kelly", "risk"},
steps=10000, freeze_backbone=True, lam=0.3, lr=1e-4),
Phase("C_meta", tasks={"regime", "calibration", "confidence"},
steps=10000, freeze_backbone=True, lam=0.3, lr=1e-4),
],
backbone_params=lambda model: model.backbone.parameters(),
)
while not curriculum.is_complete:
weights = curriculum.backward(all_losses, model)
optimizer.step()
curriculum.step(model) # Auto-freezes backbone at phase transitions
DynamicK — Hierarchical task grouping
Group tasks by function, run Golden Pendulum within and across groups. Reduces O(K^2) cost for K=16+ tasks.
from golden_pendulum.pro import DynamicK, TaskGroup
dk = DynamicK(groups=[
TaskGroup("ranking", tasks={"returns", "rank", "quality"}),
TaskGroup("risk", tasks={"vol", "mae", "kelly", "risk"}),
TaskGroup("meta", tasks={"regime", "calibration", "confidence"}),
])
weights = dk.backward(all_16_losses, model)
# Or auto-group by gradient similarity:
dk = DynamicK(auto_group=True, similarity_threshold=0.5)
DiagnosticsEngine — Real-time conflict analysis
Deep visibility into gradient conflicts, resonance detection, and convergence monitoring.
from golden_pendulum.pro import DiagnosticsEngine
diag = DiagnosticsEngine()
report = diag.analyze(losses, model)
print(f"Conflict ratio: {report.conflict_ratio}")
print(f"Norm ratio: {report.norm_ratio}x")
print(f"Conflicting pairs: {report.conflict_pairs}")
if diag.alerts:
for alert in diag.alerts:
print(f"ALERT: {alert}")
Presets — Battle-tested configurations
from golden_pendulum.pro import get_preset, list_presets
list_presets() # finance_4phase, finance_quick, nlp_multitask, vision_multitask, robotics_control
preset = get_preset("finance_4phase") # Paper's exact configuration
scheduler = CurriculumScheduler(phases=preset.phases)
Benchmarks
python benchmarks/compare_methods.py
Reproduces the paper's core finding on synthetic multi-head models.
Paper
Golden Pendulum Multi-Task Learning: Anti-Resonance Equilibria for Gradient Balancing in Multi-Head Transformer Training
Christian Knopp, 2026
See paper/golden_pendulum_mtl.pdf for the full paper.
Key results (42.5M-param, 16-head financial transformer):
| Metric | AdamW+EMA | Nash-MTL | Golden Pendulum |
|---|---|---|---|
| GOLD metrics | 5 | 1 | 7 |
| FAIL metrics | 2 | 6 | 2 |
| Trading IC | 0.012 | -0.002 | 0.033 |
| Decile Monotonicity | 0.576 | 0.455 | 0.879 |
| Vol Forecast IC | 0.645 | -0.795 | 0.695 |
| Balance (wmin/wmax) | 1.00 | 0.04 | 0.24 |
Citation
@article{knopp2026golden,
title={Golden Pendulum Multi-Task Learning: Anti-Resonance Equilibria for
Gradient Balancing in Multi-Head Transformer Training},
author={Knopp, Christian},
year={2026}
}
Author
Christian Knopp — @Conceptual1 — cknopp@gmail.com
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file golden_pendulum_mtl-0.2.0.tar.gz.
File metadata
- Download URL: golden_pendulum_mtl-0.2.0.tar.gz
- Upload date:
- Size: 37.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f92e761c3cdb48ab75aebc60d52e9dbafafc7bcf910bcacd0a2c1448d4a1fd5a
|
|
| MD5 |
3ca4b92ee59550993d4074bd474e7219
|
|
| BLAKE2b-256 |
46d6bffa6d86ce998bedf536247e766d86276f05db49f7462af59c3d898ab081
|
File details
Details for the file golden_pendulum_mtl-0.2.0-py3-none-any.whl.
File metadata
- Download URL: golden_pendulum_mtl-0.2.0-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df4b6edda8308d6b405f802fc84a3395ccc1161857ecc927bf14e38825ddfce5
|
|
| MD5 |
b87f4d45d2061cda39896c51569d6073
|
|
| BLAKE2b-256 |
7aefa4ffff1bae00893eb5296acfe5365d9c2fad2de4e81be13359a0cafb6350
|