Skip to main content

LAdam: Laplacian Adam — Adam with spatially-coupled variance estimates via discrete Laplacian

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

LAdam

Laplacian Adam — spatially-aware adaptive optimizer for PyTorch

PyPI License Python 3.8+

LAdam is a drop-in Adam replacement that applies discrete Laplacian regularization to Adam's second-moment estimate (v_t). This couples neighboring weight learning rates, producing spatially-smoothed adaptive optimization.

Why LAdam?

Adam computes independent per-parameter learning rates. But adjacent weights in trained networks are often functionally correlated — the per-parameter variance estimates should reflect this structure.

LAdam adds one operation to Adam: a Laplacian diffusion step on v_t, controlled by a single scalar c2. The Laplacian allows each weight's learning rate to be informed by its neighbors, smoothing the optimization landscape.

Results

Task Architecture Metric Adam LAdam Improvement
Wave Equation PINN 5×128 MLP L2 Error 0.0310 0.0172 -44.6%
FashionMNIST Transformer Accuracy 89.46% 89.66% +0.20% (p=0.0005)
FashionMNIST MLP Accuracy 89.10% 89.12% +0.02% (n.s.)
FashionMNIST CNN Accuracy 91.15% 91.14% -0.01% (tie)

LAdam excels on architectures with spatially-correlated weight structure — particularly PINNs and transformers. For CNNs (whose conv filters are already spatial detectors), the Laplacian is redundant.

Installation

pip install ladam

Optimizers

LAdam ships three Laplacian-enhanced optimizers:

Optimizer Base Laplacian target Best for
LAdam Adam Second moment v_t PINNs, transformers, CNNs
LAdaGrad AdaGrad Cumulative sum G_t Sparse features, NLP
LRMSProp RMSProp Running average v_t RNNs, non-stationary losses

All three share the same Laplacian kernel infrastructure and c2 parameter.

Usage

Basic — Drop-in Adam replacement

from ladam import LAdam

optimizer = LAdam(model.parameters(), lr=1e-3, c2=1e-4)

# Training loop is identical to Adam
for batch in dataloader:
    loss = criterion(model(batch))
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

LAdaGrad and LRMSProp

from ladam import LAdaGrad, LRMSProp

# AdaGrad with Laplacian smoothing on cumulative squared gradients
optimizer = LAdaGrad(model.parameters(), lr=1e-2, c2=1e-4)

# RMSProp with Laplacian smoothing on running variance
optimizer = LRMSProp(model.parameters(), lr=1e-2, alpha=0.99, c2=1e-4)

Per-layer c2 with parameter groups

optimizer = LAdam([
    {'params': model.attention.parameters(), 'c2': 1e-4},   # Transformer attention
    {'params': model.ffn.parameters(), 'c2': 1e-5},         # Feed-forward
    {'params': model.norm.parameters(), 'c2': 0.0},         # Skip for norms
], lr=3e-4)

Architecture-aware defaults

from ladam import LAdam, suggest_c2

c2 = suggest_c2('pinn')         # Returns 1e-5
c2 = suggest_c2('transformer')  # Returns 1e-4

optimizer = LAdam(model.parameters(), lr=1e-3, c2=c2)

Parameters

Parameter Default Description
lr 1e-3 Learning rate
betas (0.9, 0.999) EMA coefficients (same as Adam)
eps 1e-8 Numerical stability (same as Adam)
weight_decay 0 L2 regularization (same as AdamW behavior)
c2 1e-4 Laplacian coupling strength. Controls how much neighboring variance estimates influence each other.
mode 'variance_lap' Which quantity to smooth. 'variance_lap' is best.
stencil '9point' Discrete Laplacian stencil. '9point' (isotropic, 0.46% anisotropy) or '5point' (legacy, 12.3% anisotropy).
min_spatial_size 16 Skip Laplacian for params with fewer elements (biases, LayerNorm).

Stencil Selection

The stencil parameter controls the discrete Laplacian kernel used for spatial coupling:

  • '9point' (default): Isotropic stencil with face + edge neighbors. Treats diagonal neighbors with 1/6 weight vs 4/6 for face neighbors.
  • '5point': Standard cross-pattern stencil (faces only). Slightly faster but 25× more anisotropic.

At typical c2 values (1e-5 to 1e-3), the effective learning rate difference between stencils is <0.3%. The 9-point default is recommended for correctness.

Choosing c2

c2 is the only new hyperparameter. It's robust across 3 orders of magnitude:

c2 Best For Notes
1e-5 PINNs, scientific ML Gentle coupling, biggest error reduction
1e-4 Transformers, general Safe default
1e-3 Aggressive smoothing Works but slightly less stable
0 Disable Reduces to standard Adam

All 7 values tested in [1e-6, 1e-3] outperformed Adam on transformers (B12 sweep).

How It Works

Standard Adam computes per-parameter adaptive learning rates from the second moment:

v_t = β₂·v_{t-1} + (1-β₂)·g_t²     # Variance estimate
lr_effective = lr / (√v_t + ε)        # Per-parameter learning rate

LAdam adds a Laplacian coupling step:

v_smooth = v_t + c2 · ∇²v_t           # Spatial smoothing
lr_effective = lr / (√v_smooth + ε)    # Coupled learning rate

Where \nabla^2 is the discrete Laplacian computed via a single F.conv2d kernel (9-point isotropic by default) -- efficient and GPU-friendly. The Laplacian treats weight matrices as 2D fields, coupling each weight's learning rate with its spatial neighbors.

Overhead: ~2-5% wall-clock time increase per step. The Laplacian is a single fused convolution kernel, not point-wise iteration.

Benchmarks

PINN: Wave Equation (u_tt = c^2 u_xx)

5-layer, 128-unit tanh MLP trained for 5000 steps on the 1D wave equation.

Optimizer L2 Error vs Adam
Adam (lr=1e-3) 0.0310
LAdam c²=1e-4 0.0240 -22.8%
LAdam c²=1e-5 0.0172 -44.6%
LAdam c²=1e-3 0.0185 -40.3%

Transformer: FashionMNIST Classification

4-head, 128-dim, 2-layer transformer, 30 epochs, 5 independent seeds.

Optimizer Accuracy (mean ± std) p-value (vs Adam)
Adam 89.46 ± 0.10%
LAdam c²=1e-4 89.66 ± 0.06% 0.0005

c² Robustness Sweep

7 c² values on the same transformer task. All 7 beat Adam:

Accuracy Δ vs Adam
1e-6 89.62% +0.16%
5e-6 89.73% +0.27%
1e-5 89.79% +0.33%
5e-5 89.75% +0.29%
1e-4 89.67% +0.21%
5e-4 89.64% +0.18%
1e-3 89.66% +0.20%

FAQ

Q: Does this work for LLMs / GPT-scale models? A: No. LAdam hurts LLM training (tested on GPT-2/WikiText-2). Attention weight matrices encode semantic structure, not spatial structure — the Laplacian destroys per-feature specialization. Use standard Adam/AdamW for LLMs.

Q: Why not smooth the gradient instead of the variance? A: Osher et al. (2018) explored Laplacian smoothing of gradients. We found that smoothing the variance estimate is more effective because it smooths the learning rate landscape rather than the descent direction. These are mathematically distinct: ∇²(EMA(g²)) ≠ (∇²g)².

Q: Why does this help PINNs so much? A: PDE-based loss landscapes have inherent spatial structure from the differential operators in the loss function. The Laplacian on v_t aligns the optimizer's internal representation with this structure.

Q: Can I use this with learning rate schedulers? A: Yes. LAdam is fully compatible with any torch.optim.lr_scheduler.

Citation

If you use LAdam in your research, please cite:

@software{partin2026ladam,
  author = {Partin, Greg},
  title = {LAdam: Spatially-Aware Adaptive Optimization via Laplacian-Regularized Variance Estimates},
  year = {2026},
  url = {https://github.com/gpartin/ladam}
}

License

MIT. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ladam-0.2.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ladam-0.2.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file ladam-0.2.0.tar.gz.

File metadata

  • Download URL: ladam-0.2.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ladam-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e8359511e68ab9bdd54faeed5d97e90a49f6f4dc5f48f5c615977a562cdd9088
MD5 95c39cc489a1b498d43f3c5d3d54ba3b
BLAKE2b-256 073803d9eb4e25bf012e5ca7730bca9009623fdb79227ea5084fbfd7c78fb297

See more details on using hashes here.

File details

Details for the file ladam-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ladam-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ladam-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6dff37210c7237fc70a6eba27a6ff280dc22d4456a157f61cd2ec08f51ac381
MD5 1ec7323d1953c7c46c2314fbac4012b2
BLAKE2b-256 511f21638e456dcf97ae926acfe5e4e21b22dc5ffaad46a02024ef4ba20179c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page