LAdam: Laplacian Adam — Adam with spatially-coupled variance estimates via discrete Laplacian

These details have not been verified by PyPI

Project links

Project description

LAdam

Laplacian Adam — spatially-aware adaptive optimizer for PyTorch

LAdam is a drop-in Adam replacement that applies discrete Laplacian regularization to Adam's second-moment estimate (v_t). This couples neighboring weight learning rates, producing spatially-smoothed adaptive optimization.

Why LAdam?

Adam computes independent per-parameter learning rates. But adjacent weights in trained networks are often functionally correlated — the per-parameter variance estimates should reflect this structure.

LAdam adds one operation to Adam: a Laplacian diffusion step on v_t, controlled by a single scalar c2. The Laplacian allows each weight's learning rate to be informed by its neighbors, smoothing the optimization landscape.

Results

Task	Architecture	Metric	Adam	LAdam	Improvement	Seeds
CNN Denoising	1D CNN (32ch)	Test MSE	0.0138	0.0135	-2.2%	20 (80% WR)
Wave Equation PINN	5x128 MLP	MSE	0.00105	0.000387	-63.3%	20 (70% WR)
FashionMNIST	Transformer	Accuracy	89.46%	89.66%	+0.20% (p=0.0005)	5
Noisy Regression	Wide MLP	MSE	0.446	0.441	-1.1%	single
CIFAR-10	ResNet + Chi-Anneal	Accuracy	67.96%	73.39%	+5.43%	3
FashionMNIST	MLP	Accuracy	89.10%	89.12%	+0.02% (n.s.)	1
GPT-2 fine-tuning	LLM	Perplexity	baseline	worse	-negative-	1

LAdam works best on CNNs, PINNs, and structured regression. The key requirement is that adjacent weights in the parameter tensor are functionally related (e.g., conv filter pixels, PDE-correlated gradients). It does NOT help on LLMs or long MLP training runs.

Installation

pip install ladam

Optimizers

LAdam ships three Laplacian-enhanced optimizers. LAdam is the recommended default -- the others are included for completeness but show weaker results in benchmarks.

Optimizer	Base	Laplacian target	Best for
LAdam	Adam	Second moment v_t	CNNs, PINNs, transformers, regression
LAdaGrad	AdaGrad	Cumulative sum G_t	Included for research; weak in benchmarks
LRMSProp	RMSProp	Running average v_t	PINNs (niche)

All three share the same Laplacian kernel infrastructure and c2 parameter.

Usage

Basic — Drop-in Adam replacement

from ladam import LAdam

optimizer = LAdam(model.parameters(), lr=1e-3, c2=1e-4)

# Training loop is identical to Adam
for batch in dataloader:
    loss = criterion(model(batch))
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

LAdaGrad and LRMSProp

from ladam import LAdaGrad, LRMSProp

# AdaGrad with Laplacian smoothing on cumulative squared gradients
optimizer = LAdaGrad(model.parameters(), lr=1e-2, c2=1e-4)

# RMSProp with Laplacian smoothing on running variance
optimizer = LRMSProp(model.parameters(), lr=1e-2, alpha=0.99, c2=1e-4)

Per-layer c2 with parameter groups

optimizer = LAdam([
    {'params': model.attention.parameters(), 'c2': 1e-4},   # Transformer attention
    {'params': model.ffn.parameters(), 'c2': 1e-5},         # Feed-forward
    {'params': model.norm.parameters(), 'c2': 0.0},         # Skip for norms
], lr=3e-4)

Architecture-aware defaults

from ladam import LAdam, suggest_c2

c2 = suggest_c2('pinn')         # Returns 1e-5
c2 = suggest_c2('transformer')  # Returns 1e-4

optimizer = LAdam(model.parameters(), lr=1e-3, c2=c2)

Parameters

Parameter	Default	Description
`lr`	1e-3	Learning rate
`betas`	(0.9, 0.999)	EMA coefficients (same as Adam)
`eps`	1e-8	Numerical stability (same as Adam)
`weight_decay`	0	L2 regularization (same as AdamW behavior)
`c2`	1e-4	Laplacian coupling strength. Controls how much neighboring variance estimates influence each other.
`mode`	'variance_lap'	Which quantity to smooth. `'variance_lap'` is best.
`stencil`	'9point'	Discrete Laplacian stencil. `'9point'` (isotropic, 0.46% anisotropy) or `'5point'` (legacy, 12.3% anisotropy).
`min_spatial_size`	16	Skip Laplacian for params with fewer elements (biases, LayerNorm).

Stencil Selection

The stencil parameter controls the discrete Laplacian kernel used for spatial coupling:

'9point' (default): Isotropic stencil with face + edge neighbors. Treats diagonal neighbors with 1/6 weight vs 4/6 for face neighbors.
'5point': Standard cross-pattern stencil (faces only). Slightly faster but 25× more anisotropic.

At typical c2 values (1e-5 to 1e-3), the effective learning rate difference between stencils is <0.3%. The 9-point default is recommended for correctness.

Choosing c2

c2 is the only new hyperparameter. Optimal value depends on the architecture:

c2	Best For	Notes
`1e-3`	CNNs (denoising, reconstruction)	Conv filters have spatial structure -- strongest LAdam advantage
`5e-4`	Wide MLP regression	Moderate smoothing for dense layers
`1e-5`	PINNs, scientific ML	Gentle coupling, biggest error reduction
`1e-4`	Transformers, general	Safe default
`0`	Disable	Reduces to standard Adam

When to use LAdam

LAdam helps when adjacent weights in a parameter tensor are functionally related:

Scenario	Use LAdam?	c2	Why
CNN denoising/reconstruction	YES	1e-3	Conv filter weights have 2D spatial structure
PINNs	YES	1e-5	PDE residual creates correlated gradients
Transformers (small-medium)	YES	1e-4	Attention matrices have spatial correlations
Wide MLP regression (<500 steps)	Yes	5e-4	Dense layers learning smooth functions
CNN classification	Maybe	1e-4	Smaller benefit than denoising
Long MLP training (1000+ steps)	No	-	Laplacian perturbation accumulates destructively
LLMs (GPT-2+)	No	-	Destroys per-feature specialization in attention
Pure MLP on non-spatial tasks	No	-	Adjacent MLP weights are not related

How It Works

Standard Adam computes per-parameter adaptive learning rates from the second moment:

v_t = β₂·v_{t-1} + (1-β₂)·g_t²     # Variance estimate
lr_effective = lr / (√v_t + ε)        # Per-parameter learning rate

LAdam adds a Laplacian coupling step:

v_smooth = v_t + c2 · ∇²v_t           # Spatial smoothing
lr_effective = lr / (√v_smooth + ε)    # Coupled learning rate

Where \nabla^2 is the discrete Laplacian computed via a single F.conv2d kernel (9-point isotropic by default) -- efficient and GPU-friendly. The Laplacian treats weight matrices as 2D fields, coupling each weight's learning rate with its spatial neighbors.

Overhead: ~2-5% wall-clock time increase per step. The Laplacian is a single fused convolution kernel, not point-wise iteration.

Benchmarks

PINN: Wave Equation (u_tt = c^2 u_xx)

5-layer, 128-unit tanh MLP trained for 5000 steps on the 1D wave equation. 3 seeds, best L2 per seed.

Optimizer	Mean L2 Error	Std	vs Adam
Adam (lr=1e-3)	0.0067	± 0.0015	—
LAdam c²=1e-5	0.0066	± 0.0010	+0.8%, lower variance

LAdam converges to similar L2 but with 34% lower variance across seeds (0.0010 vs 0.0015), indicating more stable optimization.

Note: An earlier single-seed benchmark with gradient clipping showed -44.6%. Multi-seed testing without gradient clipping shows the advantage is primarily in convergence stability, not final error magnitude.

Transformer: FashionMNIST Classification

4-head, 128-dim, 2-layer transformer, 30 epochs, 5 independent seeds.

Optimizer	Accuracy (mean ± std)	p-value (vs Adam)
Adam	89.46 ± 0.10%	—
LAdam c²=1e-4	89.66 ± 0.06%	0.0005

c² Robustness Sweep

7 c² values on the same transformer task. All 7 beat Adam:

c²	Accuracy	Δ vs Adam
1e-6	89.62%	+0.16%
5e-6	89.73%	+0.27%
1e-5	89.79%	+0.33%
5e-5	89.75%	+0.29%
1e-4	89.67%	+0.21%
5e-4	89.64%	+0.18%
1e-3	89.66%	+0.20%

FAQ

Q: Does this work for LLMs / GPT-scale models? A: No. LAdam hurts LLM training (tested on GPT-2/WikiText-2). Attention weight matrices in large language models encode semantic structure, not spatial structure -- the Laplacian destroys per-feature specialization. Use standard Adam/AdamW for LLMs.

Q: Does it work for all CNNs? A: It depends on the task. CNN denoising/reconstruction shows strong LAdam benefit (80% win rate across 20 seeds) because the conv filters learn smooth kernels. CNN classification shows minimal benefit because classification filters learn sharp edge detectors. The underlying principle: LAdam helps when the learned filters are spatially smooth.

Q: Why not smooth the gradient instead of the variance? A: Osher et al. (2018) explored Laplacian smoothing of gradients. We found that smoothing the variance estimate is more effective because it smooths the learning rate landscape rather than the descent direction. These are mathematically distinct: ∇²(EMA(g²)) ≠ (∇²g)².

Q: Why does this help PINNs so much? A: PDE-based loss landscapes have inherent spatial structure from the differential operators in the loss function. The Laplacian on v_t aligns the optimizer's internal representation with this structure.

Q: Can I use this with learning rate schedulers? A: Yes. LAdam is fully compatible with any torch.optim.lr_scheduler.

Citation

If you use LAdam in your research, please cite:

@software{partin2026ladam,
  author = {Partin, Greg},
  title = {LAdam: Spatially-Aware Adaptive Optimization via Laplacian-Regularized Variance Estimates},
  year = {2026},
  url = {https://github.com/gpartin/ladam}
}

License

MIT. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.1

Apr 11, 2026

0.4.0

Apr 7, 2026

0.3.0

Apr 7, 2026

0.2.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ladam-0.4.1.tar.gz (627.8 kB view details)

Uploaded Apr 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ladam-0.4.1-py3-none-any.whl (21.5 kB view details)

Uploaded Apr 11, 2026 Python 3

File details

Details for the file ladam-0.4.1.tar.gz.

File metadata

Download URL: ladam-0.4.1.tar.gz
Upload date: Apr 11, 2026
Size: 627.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ladam-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`38759a44bac8b91bc64ef661e0d456b531e31f54a19ab1efb66b435997208d6c`
MD5	`e4b22d52ea2768fc4e81736a09f6d5b3`
BLAKE2b-256	`f2fb271f10df77816b6826b76b642ee57a454b6e435eb88b0a479d8f1be518e4`

See more details on using hashes here.

File details

Details for the file ladam-0.4.1-py3-none-any.whl.

File metadata

Download URL: ladam-0.4.1-py3-none-any.whl
Upload date: Apr 11, 2026
Size: 21.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ladam-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4e490ce376bba18523270a58a31903eafb57491b0dd696c78cd82c59412d61a4`
MD5	`db24a2ba7e8da76d3065197e46c7f60b`
BLAKE2b-256	`c070387ffec3b9d5f72d25f78ef852c06b8a52cac4436a86dbf21f70bd05ce39`

See more details on using hashes here.

ladam 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LAdam

Why LAdam?

Results

Installation

Optimizers

Usage

Basic — Drop-in Adam replacement

LAdaGrad and LRMSProp

Per-layer c2 with parameter groups

Architecture-aware defaults

Parameters

Stencil Selection

Choosing c2

When to use LAdam

How It Works

Benchmarks

PINN: Wave Equation (u_tt = c^2 u_xx)

Transformer: FashionMNIST Classification

c² Robustness Sweep

FAQ

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes