LAdam: Laplacian Adam — Adam with spatially-coupled variance estimates via discrete Laplacian
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
LAdam
Laplacian Adam — spatially-aware adaptive optimizer for PyTorch
LAdam is a drop-in Adam replacement that applies discrete Laplacian regularization to Adam's second-moment estimate (v_t). This couples neighboring weight learning rates, producing spatially-smoothed adaptive optimization.
Why LAdam?
Adam computes independent per-parameter learning rates. But adjacent weights in trained networks are often functionally correlated — the per-parameter variance estimates should reflect this structure.
LAdam adds one operation to Adam: a Laplacian diffusion step on v_t, controlled by a single scalar c2. The Laplacian allows each weight's learning rate to be informed by its neighbors, smoothing the optimization landscape.
Results
| Task | Architecture | Metric | Adam | LAdam | Improvement |
|---|---|---|---|---|---|
| Wave Equation PINN | 5×128 MLP | L2 Error | 0.0310 | 0.0172 | -44.6% |
| FashionMNIST | Transformer | Accuracy | 89.46% | 89.66% | +0.20% (p=0.0005) |
| FashionMNIST | MLP | Accuracy | 89.10% | 89.12% | +0.02% (n.s.) |
| FashionMNIST | CNN | Accuracy | 91.15% | 91.14% | -0.01% (tie) |
LAdam excels on architectures with spatially-correlated weight structure — particularly PINNs and transformers. For CNNs (whose conv filters are already spatial detectors), the Laplacian is redundant.
Installation
pip install ladam
Optimizers
LAdam ships three Laplacian-enhanced optimizers:
| Optimizer | Base | Laplacian target | Best for |
|---|---|---|---|
| LAdam | Adam | Second moment v_t | PINNs, transformers, CNNs |
| LAdaGrad | AdaGrad | Cumulative sum G_t | Sparse features, NLP |
| LRMSProp | RMSProp | Running average v_t | RNNs, non-stationary losses |
All three share the same Laplacian kernel infrastructure and c2 parameter.
Usage
Basic — Drop-in Adam replacement
from ladam import LAdam
optimizer = LAdam(model.parameters(), lr=1e-3, c2=1e-4)
# Training loop is identical to Adam
for batch in dataloader:
loss = criterion(model(batch))
loss.backward()
optimizer.step()
optimizer.zero_grad()
LAdaGrad and LRMSProp
from ladam import LAdaGrad, LRMSProp
# AdaGrad with Laplacian smoothing on cumulative squared gradients
optimizer = LAdaGrad(model.parameters(), lr=1e-2, c2=1e-4)
# RMSProp with Laplacian smoothing on running variance
optimizer = LRMSProp(model.parameters(), lr=1e-2, alpha=0.99, c2=1e-4)
Per-layer c2 with parameter groups
optimizer = LAdam([
{'params': model.attention.parameters(), 'c2': 1e-4}, # Transformer attention
{'params': model.ffn.parameters(), 'c2': 1e-5}, # Feed-forward
{'params': model.norm.parameters(), 'c2': 0.0}, # Skip for norms
], lr=3e-4)
Architecture-aware defaults
from ladam import LAdam, suggest_c2
c2 = suggest_c2('pinn') # Returns 1e-5
c2 = suggest_c2('transformer') # Returns 1e-4
optimizer = LAdam(model.parameters(), lr=1e-3, c2=c2)
Parameters
| Parameter | Default | Description |
|---|---|---|
lr |
1e-3 | Learning rate |
betas |
(0.9, 0.999) | EMA coefficients (same as Adam) |
eps |
1e-8 | Numerical stability (same as Adam) |
weight_decay |
0 | L2 regularization (same as AdamW behavior) |
c2 |
1e-4 | Laplacian coupling strength. Controls how much neighboring variance estimates influence each other. |
mode |
'variance_lap' | Which quantity to smooth. 'variance_lap' is best. |
stencil |
'9point' | Discrete Laplacian stencil. '9point' (isotropic, 0.46% anisotropy) or '5point' (legacy, 12.3% anisotropy). |
min_spatial_size |
16 | Skip Laplacian for params with fewer elements (biases, LayerNorm). |
Stencil Selection
The stencil parameter controls the discrete Laplacian kernel used for spatial coupling:
'9point'(default): Isotropic stencil with face + edge neighbors. Treats diagonal neighbors with 1/6 weight vs 4/6 for face neighbors.'5point': Standard cross-pattern stencil (faces only). Slightly faster but 25× more anisotropic.
At typical c2 values (1e-5 to 1e-3), the effective learning rate difference between stencils is <0.3%. The 9-point default is recommended for correctness.
Choosing c2
c2 is the only new hyperparameter. It's robust across 3 orders of magnitude:
| c2 | Best For | Notes |
|---|---|---|
1e-5 |
PINNs, scientific ML | Gentle coupling, biggest error reduction |
1e-4 |
Transformers, general | Safe default |
1e-3 |
Aggressive smoothing | Works but slightly less stable |
0 |
Disable | Reduces to standard Adam |
All 7 values tested in [1e-6, 1e-3] outperformed Adam on transformers (B12 sweep).
How It Works
Standard Adam computes per-parameter adaptive learning rates from the second moment:
v_t = β₂·v_{t-1} + (1-β₂)·g_t² # Variance estimate
lr_effective = lr / (√v_t + ε) # Per-parameter learning rate
LAdam adds a Laplacian coupling step:
v_smooth = v_t + c2 · ∇²v_t # Spatial smoothing
lr_effective = lr / (√v_smooth + ε) # Coupled learning rate
Where \nabla^2 is the discrete Laplacian computed via a single F.conv2d kernel (9-point isotropic by default) -- efficient and GPU-friendly. The Laplacian treats weight matrices as 2D fields, coupling each weight's learning rate with its spatial neighbors.
Overhead: ~2-5% wall-clock time increase per step. The Laplacian is a single fused convolution kernel, not point-wise iteration.
Benchmarks
PINN: Wave Equation (u_tt = c^2 u_xx)
5-layer, 128-unit tanh MLP trained for 5000 steps on the 1D wave equation.
| Optimizer | L2 Error | vs Adam |
|---|---|---|
| Adam (lr=1e-3) | 0.0310 | — |
| LAdam c²=1e-4 | 0.0240 | -22.8% |
| LAdam c²=1e-5 | 0.0172 | -44.6% |
| LAdam c²=1e-3 | 0.0185 | -40.3% |
Transformer: FashionMNIST Classification
4-head, 128-dim, 2-layer transformer, 30 epochs, 5 independent seeds.
| Optimizer | Accuracy (mean ± std) | p-value (vs Adam) |
|---|---|---|
| Adam | 89.46 ± 0.10% | — |
| LAdam c²=1e-4 | 89.66 ± 0.06% | 0.0005 |
c² Robustness Sweep
7 c² values on the same transformer task. All 7 beat Adam:
| c² | Accuracy | Δ vs Adam |
|---|---|---|
| 1e-6 | 89.62% | +0.16% |
| 5e-6 | 89.73% | +0.27% |
| 1e-5 | 89.79% | +0.33% |
| 5e-5 | 89.75% | +0.29% |
| 1e-4 | 89.67% | +0.21% |
| 5e-4 | 89.64% | +0.18% |
| 1e-3 | 89.66% | +0.20% |
FAQ
Q: Does this work for LLMs / GPT-scale models? A: No. LAdam hurts LLM training (tested on GPT-2/WikiText-2). Attention weight matrices encode semantic structure, not spatial structure — the Laplacian destroys per-feature specialization. Use standard Adam/AdamW for LLMs.
Q: Why not smooth the gradient instead of the variance? A: Osher et al. (2018) explored Laplacian smoothing of gradients. We found that smoothing the variance estimate is more effective because it smooths the learning rate landscape rather than the descent direction. These are mathematically distinct: ∇²(EMA(g²)) ≠ (∇²g)².
Q: Why does this help PINNs so much? A: PDE-based loss landscapes have inherent spatial structure from the differential operators in the loss function. The Laplacian on v_t aligns the optimizer's internal representation with this structure.
Q: Can I use this with learning rate schedulers?
A: Yes. LAdam is fully compatible with any torch.optim.lr_scheduler.
Citation
If you use LAdam in your research, please cite:
@software{partin2026ladam,
author = {Partin, Greg},
title = {LAdam: Spatially-Aware Adaptive Optimization via Laplacian-Regularized Variance Estimates},
year = {2026},
url = {https://github.com/gpartin/ladam}
}
License
MIT. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ladam-0.2.0.tar.gz.
File metadata
- Download URL: ladam-0.2.0.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8359511e68ab9bdd54faeed5d97e90a49f6f4dc5f48f5c615977a562cdd9088
|
|
| MD5 |
95c39cc489a1b498d43f3c5d3d54ba3b
|
|
| BLAKE2b-256 |
073803d9eb4e25bf012e5ca7730bca9009623fdb79227ea5084fbfd7c78fb297
|
File details
Details for the file ladam-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ladam-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6dff37210c7237fc70a6eba27a6ff280dc22d4456a157f61cd2ec08f51ac381
|
|
| MD5 |
1ec7323d1953c7c46c2314fbac4012b2
|
|
| BLAKE2b-256 |
511f21638e456dcf97ae926acfe5e4e21b22dc5ffaad46a02024ef4ba20179c4
|