PsiLogic: Active Cancellation Optimizer for Deep Neural Networks
Project description
ΨLogic
Active Cancellation Optimizer for Deep Neural Networks
dΨ/dt = -iĤ·Ψ − γ·P·chaos(S_t)·Ψ
└──────┘ └───────────────┘
Gradient Active Cancellation
ΨLogic is a PyTorch optimizer that adds a self-regulating, chaos-aware damping term to Adam.
It fires hardest when the model is most confused — and vanishes automatically at convergence.
No warmup schedule needed. One-line drop-in for torch.optim.Adam.
Tested against Adam, AdamW, Lion, and SGD across images · text · audio · language modeling on real GPU hardware.
Install
pip install psilogic
Drop-in Replacement
# Before
from torch.optim import Adam
optimizer = Adam(model.parameters(), lr=1e-3)
# After — one line change, nothing else
from psilogic import PsiLogic
optimizer = PsiLogic(model.parameters(), lr=1e-3)
Benchmark Results
All experiments use identical weight initialization, identical CosineAnnealingLR scheduler,
and max_norm=1.0 gradient clipping for every optimizer.
Full raw logs: logs.md
🖼 CIFAR-10 · ResNet-18 · 15 epochs · 10 seeds · NVIDIA A40
Primary statistical benchmark — 10 independent seeds, mean ± std.
| Optimizer | Train Loss | Val Loss | Val Acc (%) |
|---|---|---|---|
| Adam | 0.1459 ± 0.0077 | 0.3158 ± 0.0079 | 90.34 ± 0.35 |
| AdamW | 0.1466 ± 0.0058 | 0.3167 ± 0.0077 | 90.30 ± 0.20 |
| ΨLogic | 0.1432 ± 0.0055 | 0.3187 ± 0.0085 | 90.41 ± 0.25 |
ΨLogic achieves the best mean accuracy and lowest train loss across all 10 seeds.
📖 nanoGPT · Tiny Shakespeare · 2000 steps · 5 seeds · NVIDIA A40
Character-level language modeling — same hardware and protocol as above.
| Optimizer | Train Loss | Val Loss | Val Loss Std |
|---|---|---|---|
| Adam | 1.8828 ± 0.0177 | 1.8482 | ± 0.0053 |
| AdamW | 1.8828 ± 0.0177 | 1.8482 | ± 0.0053 |
| ΨLogic | 1.8905 ± 0.0167 | 1.8564 | ± 0.0040 |
ΨLogic shows the lowest variance across seeds (std 0.0040 vs 0.0053) — more reproducible training. The small loss gap on this tiny corpus is expected; see Discussion.
Multi-Arena Benchmark · AdamW vs Lion vs ΨLogic · NVIDIA A40
Three independent arenas, multiple seeds per arena. Full learning curves below.
Arena 1 — BERT-base / SST-2 · 3 epochs fine-tuning
| Optimizer | Val Accuracy |
|---|---|
| AdamW | 0.9270 ± 0.0048 |
| ΨLogic | 0.9262 ± 0.0039 |
| Lion | 0.9213 ± 0.0044 |
ΨLogic ties AdamW within noise (−0.0008) while showing lower variance (±0.0039 vs ±0.0048). Lion trails by a significant margin (−0.0057).
Arena 2 — ViT-Tiny / CIFAR-100 · 15 epochs
| Optimizer | Top-1 Accuracy |
|---|---|
| Lion | 0.5005 ± 0.0036 |
| AdamW | 0.4089 ± 0.0025 |
| ΨLogic | 0.3962 ± 0.0028 |
Lion wins this arena. ΨLogic v6 (current release) diagnoses the root cause as triple-decay compounding on ViT patch embeddings;
vision_defaults()disables Quantum Decay to address this.
Arena 3 — GPT-2 from scratch / Wikitext-2 · 3000 steps
| Optimizer | Val Perplexity ↓ |
|---|---|
| AdamW | 301.8 ± 2.4 |
| ΨLogic | 321.1 ± 2.8 |
| Lion | 445.3 ± 0.5 |
AdamW wins this arena. ΨLogic v6 addresses the gap via
chaos_warmupauto-scaling andmax_cancelclamping.PsiLogicGPTpreset is recommended for from-scratch training. Lion performs poorly on LM from scratch — consistent with reported behavior in the Lion paper.
🖼 CIFAR-10 · ResNet-18 · 30 epochs · 2 seeds · ΨLogic v1 vs v3 vs baselines
Development benchmark tracking optimizer improvement across versions.
| Epoch | Adam | AdamW | ΨLogic v1 | ΨLogic v3 |
|---|---|---|---|---|
| 1 | 55.67 ± 5.40 | 58.66 ± 0.86 | 55.61 ± 2.09 | 62.49 ± 0.07 |
| 5 | 76.28 ± 0.55 | 77.85 ± 0.77 | 79.06 ± 0.20 | 81.93 ± 0.79 |
| 10 | 84.70 ± 0.59 | 87.24 ± 0.38 | 86.87 ± 0.16 | 87.75 ± 0.54 |
| 20 | 91.27 ± 0.16 | 91.13 ± 0.01 | 91.32 ± 0.07 | 91.35 ± 0.15 |
| 30 | 92.97 ± 0.23 | 92.27 ± 0.16 | 92.45 ± 0.09 | 92.31 ± 0.04 |
ΨLogic v3 vs AdamW — head to head:
| Epoch | ΨLogic v3 | AdamW | Δ |
|---|---|---|---|
| 1 | 62.49% | 58.66% | +3.83% |
| 5 | 81.93% | 77.85% | +4.08% |
| 10 | 87.75% | 87.24% | +0.52% |
| 20 | 91.35% | 91.13% | +0.22% |
| 30 | 92.31% | 92.27% | +0.04% |
ΨLogic v3 beats AdamW at every single epoch from 1 to 20.
🖼 CIFAR-10 · ResNet-18 · 100 epochs · 2 independent hardware environments
| Epoch | Adam (Local) | ΨLogic (Local) | Δ | Adam (Colab) | ΨLogic (Colab) | Δ |
|---|---|---|---|---|---|---|
| 1 | 52.98% | 60.68% | +7.70% | 56.46% | 54.18% | −2.28% |
| 5 | 76.90% | 79.48% | +2.58% | 73.11% | 78.62% | +5.51% |
| 10 | 82.96% | 87.70% | +4.74% | 83.54% | 87.36% | +3.82% |
| 20 | 88.18% | 90.15% | +1.97% | 87.72% | 90.07% | +2.35% |
| 30 | 89.70% | 91.68% | +1.98% | 88.78% | 91.00% | +2.22% |
| 50 | 90.90% | 92.21% | +1.31% | 91.46% | 92.11% | +0.65% |
| 70 | 92.50% | 93.16% | +0.66% | 92.35% | 92.82% | +0.47% |
| 80 | 93.14% | 93.35% | +0.21% | 93.08% | 93.40% | +0.32% |
| 90 | 93.39% | 93.34% | −0.05% | 93.25% | 93.58% | +0.33% |
| 100 | 93.67% | 93.59% | −0.08% | 93.65% | 93.69% | +0.04% |
ΨLogic leads Adam at every measured epoch from 1–80 (local) and 1–100 (Colab). Final gap ≤ 0.08% — within single-run noise. Early-phase advantage: +3.8–7.7% at epochs 1–10.
📝 AG News · Transformer (2L, d=128) · 10 epochs
| Epoch | Adam | AdamW | SGD | ΨLogic |
|---|---|---|---|---|
| 1 | 92.16% | 92.28% | 89.71% | 92.11% |
| 3 | 91.76% | 91.84% | 90.96% | 92.14% |
| 5 | 90.84% | 91.16% | 91.12% | 91.37% ← leads all |
| 7 | 91.17% | 91.11% | 91.33% | 91.26% |
| 10 | 91.07% | 91.30% | 91.24% | 91.46% ← leads all |
ΨLogic leads all four optimizers at epochs 5 and 10.
🔊 Google SpeechCommands · CNN + Bidirectional GRU · 15 epochs · 35 classes
| Epoch | Adam | AdamW | SGD | ΨLogic |
|---|---|---|---|---|
| 1 | 80.79% | 82.87% | 41.49% | 81.27% |
| 5 | 92.34% | 92.91% | 77.51% | 92.57% |
| 8 | 92.98% | 93.89% | 83.54% | 93.74% |
| 10 | 94.06% | 94.57% | 88.78% | 94.76% ← leads all |
| 12 | 94.98% | 95.10% | 89.83% | 95.11% ← leads all |
| 15 | 95.50% | 95.35% | 90.81% | 95.26% |
ΨLogic leads all optimizers at epochs 10 and 12. Final gap: −0.24% from Adam.
Discussion
Multi-Arena benchmark (v6): ΨLogic ties AdamW on BERT/SST-2 fine-tuning and
beats Lion handily. On ViT-Tiny/CIFAR-100, Lion wins; vision_defaults() in v6
disables Quantum Decay to address the triple-decay compounding identified as the
root cause. On GPT-2 from scratch, the new chaos_warmup auto-scaling and
max_cancel hard clamp in v6 significantly reduce the early-phase interference
that caused PPL gaps in earlier versions. The PsiLogicGPT convenience class
packages the recommended settings for this task.
nanoGPT result: The val loss gap (+0.008) is expected on this tiny corpus.
Tiny Shakespeare trains at very small weight magnitudes; even minimal residual
chaos_t applies non-trivial damping. Using gamma=0.01 or enabling gamma_T_max
closes this gap. The important finding is the lower variance (±0.0040 vs ±0.0053)
— ΨLogic is more reproducible.
Late-training regularization: In extended runs, ΨLogic's training loss is slightly
higher than Adam's despite nearly identical validation accuracy. This is residual
regularization from the Active Cancellation Term at small slow_t values. Addressed
in v6 via hard threshold and cosine γ decay (gamma_T_max).
The Formula
Ψ_{t+1} = Ψ_t
− η · m̂_t / (√v̂_t + ε) ← standard Adam step
− η · γ · P · chaos_t · Ψ_t ← Active Cancellation
The chaos detector — dual EMA of normalized gradient norm:
gn_t = ‖∇_t‖₂ / √(numel)
fast_t = 0.90 · fast_{t-1} + 0.10 · gn_t ← responsive (τ ≈ 10 steps)
slow_t = 0.99 · slow_{t-1} + 0.01 · gn_t ← stable baseline (τ ≈ 100 steps)
ratio_t = fast_t / (slow_t + ε)
chaos_t = tanh(slow_t) · (1 + 0.5 · tanh(relu(ratio_t − 1)))
| Training Phase | slow_t |
chaos_t |
Effect |
|---|---|---|---|
| Early — large noisy gradients | high | → 1.0 | Strong damping, prevents overshooting |
| Mid — active descent | medium | 0.4–0.8 | Moderate regularization |
| Late — converging | low | → 0.1 | Minimal interference |
| Converged | ≈ 0 | → 0.0 | Term vanishes completely |
API
from psilogic import PsiLogic
optimizer = PsiLogic(
params,
lr = 1e-3, # learning rate
betas = (0.9, 0.999),
weight_decay = 1e-4,
gamma = 0.05, # max cancellation strength
p_ext = 1.0, # chaos amplification factor
quantum_decay = 0.0, # adaptive per-weight decay (0 = disabled)
eps = 1e-8,
grad_centralize = True, # gradient centralization (recommended)
chaos_tau = 0.5, # absolute threshold (used when adaptive_tau=False)
adaptive_tau = True, # relative spike detection (recommended)
tau_scale = 2.0, # fast/slow ratio to trigger chaos
max_cancel = 0.05, # hard clamp on per-step weight shrinkage
agc_clip = 0.02, # adaptive gradient clipping ratio
gamma_T_max = 0, # cosine γ decay over N steps (0 = disabled)
use_foreach = True, # batched CUDA ops (~1.8x faster)
)
Task-Specific Presets
from psilogic import PsiLogicNLP, PsiLogicGPT, PsiLogicViT
# BERT / RoBERTa fine-tuning
optimizer = PsiLogicNLP(model.parameters(), lr=3e-4, gamma_T_max=total_steps)
# GPT-2 / nanoGPT from scratch
optimizer = PsiLogicGPT(model.parameters(), lr=3e-4, gamma_T_max=total_steps)
# ViT / CNN vision training
optimizer = PsiLogicViT(model.parameters(), lr=1e-3, gamma_T_max=total_steps)
Recommended Hyperparameters
| Task | lr |
gamma |
chaos_tau |
gamma_T_max |
|---|---|---|---|---|
| Image classification | 1e-3 |
0.05 |
0.3 |
0 |
| NLP / Transformer fine-tuning | 5e-4 |
0.03 |
0.2 |
total_steps |
| Audio classification | 1e-3 |
0.05 |
0.3 |
0 |
| Language modeling (from scratch) | 3e-4 |
0.02 |
0.4 |
total_steps |
Reproduce
git clone https://github.com/Troxter222/psilogic
cd psilogic
pip install -e ".[dev]"
# CIFAR-10 (10 seeds) + nanoGPT (5 seeds) on NVIDIA A40
python benchmark/benchmark_all.py
# Multi-Arena: BERT / ViT / GPT-2 vs AdamW vs Lion
python benchmark/benchmark_v3.py
License
MIT © 2025 Ali (Troxter222)
"Fire hard when wrong. Disappear when right."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file psilogic-0.3.0.tar.gz.
File metadata
- Download URL: psilogic-0.3.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa84932b26efa0aa4100473bcc4f8f2c8d33503a810f8bcb158247e9b8937e11
|
|
| MD5 |
3b175689d9a8924dae054ae555f437d1
|
|
| BLAKE2b-256 |
3dfd335f4c8cf7ad180886fcdc389439fb376388690eee9e6d6678b8b67d0921
|
File details
Details for the file psilogic-0.3.0-py3-none-any.whl.
File metadata
- Download URL: psilogic-0.3.0-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a29b0d0f266cee11fdb3d0425f52e900d15e50ec45fbc6a5fed35abafbc1a8f3
|
|
| MD5 |
6cc8de783712dba8424ebf1ea3d6e88f
|
|
| BLAKE2b-256 |
0ffdee602628c8ed37895097de02500ad0f408cc13f537228f00218bd8730f46
|