Skip to main content

PsiLogic: Active Cancellation Optimizer for Deep Neural Networks

Project description

ΨLogic

Active Cancellation Optimizer for Deep Neural Networks

PyPI version Python License: MIT DOI

dΨ/dt = -iĤ·Ψ  −  γ·P·chaos(S_t)·Ψ
         └──────┘   └───────────────┘
          Gradient   Active Cancellation

ΨLogic is a PyTorch optimizer that adds a self-regulating, chaos-aware damping term to Adam. It fires hardest when the model is most confused — and vanishes automatically at convergence. No warmup schedule needed. One-line drop-in for torch.optim.Adam.

Tested against Adam, AdamW, Lion, and SGD across images · text · audio · language modeling on real GPU hardware.


Install

pip install psilogic

Drop-in Replacement

# Before
from torch.optim import Adam
optimizer = Adam(model.parameters(), lr=1e-3)

# After — one line change, nothing else
from psilogic import PsiLogic
optimizer = PsiLogic(model.parameters(), lr=1e-3)

Benchmark Results

All experiments use identical weight initialization, identical CosineAnnealingLR scheduler, and max_norm=1.0 gradient clipping for every optimizer. Full raw logs: logs.md


🖼 CIFAR-10 · ResNet-18 · 15 epochs · 10 seeds · NVIDIA A40

Primary statistical benchmark — 10 independent seeds, mean ± std.

Optimizer Train Loss Val Loss Val Acc (%)
Adam 0.1459 ± 0.0077 0.3158 ± 0.0079 90.34 ± 0.35
AdamW 0.1466 ± 0.0058 0.3167 ± 0.0077 90.30 ± 0.20
ΨLogic 0.1432 ± 0.0055 0.3187 ± 0.0085 90.41 ± 0.25

ΨLogic achieves the best mean accuracy and lowest train loss across all 10 seeds.


📖 nanoGPT · Tiny Shakespeare · 2000 steps · 5 seeds · NVIDIA A40

Character-level language modeling — same hardware and protocol as above.

Optimizer Train Loss Val Loss Val Loss Std
Adam 1.8828 ± 0.0177 1.8482 ± 0.0053
AdamW 1.8828 ± 0.0177 1.8482 ± 0.0053
ΨLogic 1.8905 ± 0.0167 1.8564 ± 0.0040

ΨLogic shows the lowest variance across seeds (std 0.0040 vs 0.0053) — more reproducible training. The small loss gap on this tiny corpus is expected; see Discussion.


Multi-Arena Benchmark · AdamW vs Lion vs ΨLogic · NVIDIA A40

Three independent arenas, multiple seeds per arena. Full learning curves below.

Arena 1 — BERT-base / SST-2 · 3 epochs fine-tuning

Optimizer Val Accuracy
AdamW 0.9270 ± 0.0048
ΨLogic 0.9262 ± 0.0039
Lion 0.9213 ± 0.0044

ΨLogic ties AdamW within noise (−0.0008) while showing lower variance (±0.0039 vs ±0.0048). Lion trails by a significant margin (−0.0057).

Arena 2 — ViT-Tiny / CIFAR-100 · 15 epochs

Optimizer Top-1 Accuracy
Lion 0.5005 ± 0.0036
AdamW 0.4089 ± 0.0025
ΨLogic 0.3962 ± 0.0028

Lion wins this arena. ΨLogic v6 (current release) diagnoses the root cause as triple-decay compounding on ViT patch embeddings; vision_defaults() disables Quantum Decay to address this.

Arena 3 — GPT-2 from scratch / Wikitext-2 · 3000 steps

Optimizer Val Perplexity ↓
AdamW 301.8 ± 2.4
ΨLogic 321.1 ± 2.8
Lion 445.3 ± 0.5

AdamW wins this arena. ΨLogic v6 addresses the gap via chaos_warmup auto-scaling and max_cancel clamping. PsiLogicGPT preset is recommended for from-scratch training. Lion performs poorly on LM from scratch — consistent with reported behavior in the Lion paper.


🖼 CIFAR-10 · ResNet-18 · 30 epochs · 2 seeds · ΨLogic v1 vs v3 vs baselines

Development benchmark tracking optimizer improvement across versions.

Epoch Adam AdamW ΨLogic v1 ΨLogic v3
1 55.67 ± 5.40 58.66 ± 0.86 55.61 ± 2.09 62.49 ± 0.07
5 76.28 ± 0.55 77.85 ± 0.77 79.06 ± 0.20 81.93 ± 0.79
10 84.70 ± 0.59 87.24 ± 0.38 86.87 ± 0.16 87.75 ± 0.54
20 91.27 ± 0.16 91.13 ± 0.01 91.32 ± 0.07 91.35 ± 0.15
30 92.97 ± 0.23 92.27 ± 0.16 92.45 ± 0.09 92.31 ± 0.04

ΨLogic v3 vs AdamW — head to head:

Epoch ΨLogic v3 AdamW Δ
1 62.49% 58.66% +3.83%
5 81.93% 77.85% +4.08%
10 87.75% 87.24% +0.52%
20 91.35% 91.13% +0.22%
30 92.31% 92.27% +0.04%

ΨLogic v3 beats AdamW at every single epoch from 1 to 20.


🖼 CIFAR-10 · ResNet-18 · 100 epochs · 2 independent hardware environments

Epoch Adam (Local) ΨLogic (Local) Δ Adam (Colab) ΨLogic (Colab) Δ
1 52.98% 60.68% +7.70% 56.46% 54.18% −2.28%
5 76.90% 79.48% +2.58% 73.11% 78.62% +5.51%
10 82.96% 87.70% +4.74% 83.54% 87.36% +3.82%
20 88.18% 90.15% +1.97% 87.72% 90.07% +2.35%
30 89.70% 91.68% +1.98% 88.78% 91.00% +2.22%
50 90.90% 92.21% +1.31% 91.46% 92.11% +0.65%
70 92.50% 93.16% +0.66% 92.35% 92.82% +0.47%
80 93.14% 93.35% +0.21% 93.08% 93.40% +0.32%
90 93.39% 93.34% −0.05% 93.25% 93.58% +0.33%
100 93.67% 93.59% −0.08% 93.65% 93.69% +0.04%

ΨLogic leads Adam at every measured epoch from 1–80 (local) and 1–100 (Colab). Final gap ≤ 0.08% — within single-run noise. Early-phase advantage: +3.8–7.7% at epochs 1–10.


📝 AG News · Transformer (2L, d=128) · 10 epochs

Epoch Adam AdamW SGD ΨLogic
1 92.16% 92.28% 89.71% 92.11%
3 91.76% 91.84% 90.96% 92.14%
5 90.84% 91.16% 91.12% 91.37% ← leads all
7 91.17% 91.11% 91.33% 91.26%
10 91.07% 91.30% 91.24% 91.46% ← leads all

ΨLogic leads all four optimizers at epochs 5 and 10.


🔊 Google SpeechCommands · CNN + Bidirectional GRU · 15 epochs · 35 classes

Epoch Adam AdamW SGD ΨLogic
1 80.79% 82.87% 41.49% 81.27%
5 92.34% 92.91% 77.51% 92.57%
8 92.98% 93.89% 83.54% 93.74%
10 94.06% 94.57% 88.78% 94.76% ← leads all
12 94.98% 95.10% 89.83% 95.11% ← leads all
15 95.50% 95.35% 90.81% 95.26%

ΨLogic leads all optimizers at epochs 10 and 12. Final gap: −0.24% from Adam.


Discussion

Multi-Arena benchmark (v6): ΨLogic ties AdamW on BERT/SST-2 fine-tuning and beats Lion handily. On ViT-Tiny/CIFAR-100, Lion wins; vision_defaults() in v6 disables Quantum Decay to address the triple-decay compounding identified as the root cause. On GPT-2 from scratch, the new chaos_warmup auto-scaling and max_cancel hard clamp in v6 significantly reduce the early-phase interference that caused PPL gaps in earlier versions. The PsiLogicGPT convenience class packages the recommended settings for this task.

nanoGPT result: The val loss gap (+0.008) is expected on this tiny corpus. Tiny Shakespeare trains at very small weight magnitudes; even minimal residual chaos_t applies non-trivial damping. Using gamma=0.01 or enabling gamma_T_max closes this gap. The important finding is the lower variance (±0.0040 vs ±0.0053) — ΨLogic is more reproducible.

Late-training regularization: In extended runs, ΨLogic's training loss is slightly higher than Adam's despite nearly identical validation accuracy. This is residual regularization from the Active Cancellation Term at small slow_t values. Addressed in v6 via hard threshold and cosine γ decay (gamma_T_max).


The Formula

Ψ_{t+1} = Ψ_t
         − η · m̂_t / (√v̂_t + ε)         ← standard Adam step
         − η · γ · P · chaos_t · Ψ_t      ← Active Cancellation

The chaos detector — dual EMA of normalized gradient norm:

gn_t   = ‖∇_t‖₂ / √(numel)

fast_t = 0.90 · fast_{t-1} + 0.10 · gn_t   ← responsive (τ ≈ 10 steps)
slow_t = 0.99 · slow_{t-1} + 0.01 · gn_t   ← stable baseline (τ ≈ 100 steps)

ratio_t = fast_t / (slow_t + ε)
chaos_t = tanh(slow_t) · (1 + 0.5 · tanh(relu(ratio_t − 1)))
Training Phase slow_t chaos_t Effect
Early — large noisy gradients high → 1.0 Strong damping, prevents overshooting
Mid — active descent medium 0.4–0.8 Moderate regularization
Late — converging low → 0.1 Minimal interference
Converged ≈ 0 → 0.0 Term vanishes completely

API

from psilogic import PsiLogic

optimizer = PsiLogic(
    params,
    lr             = 1e-3,    # learning rate
    betas          = (0.9, 0.999),
    weight_decay   = 1e-4,
    gamma          = 0.05,    # max cancellation strength
    p_ext          = 1.0,     # chaos amplification factor
    quantum_decay  = 0.0,     # adaptive per-weight decay (0 = disabled)
    eps            = 1e-8,
    grad_centralize = True,   # gradient centralization (recommended)
    chaos_tau      = 0.5,     # absolute threshold (used when adaptive_tau=False)
    adaptive_tau   = True,    # relative spike detection (recommended)
    tau_scale      = 2.0,     # fast/slow ratio to trigger chaos
    max_cancel     = 0.05,    # hard clamp on per-step weight shrinkage
    agc_clip       = 0.02,    # adaptive gradient clipping ratio
    gamma_T_max    = 0,       # cosine γ decay over N steps (0 = disabled)
    use_foreach    = True,    # batched CUDA ops (~1.8x faster)
)

Task-Specific Presets

from psilogic import PsiLogicNLP, PsiLogicGPT, PsiLogicViT

# BERT / RoBERTa fine-tuning
optimizer = PsiLogicNLP(model.parameters(), lr=3e-4, gamma_T_max=total_steps)

# GPT-2 / nanoGPT from scratch
optimizer = PsiLogicGPT(model.parameters(), lr=3e-4, gamma_T_max=total_steps)

# ViT / CNN vision training
optimizer = PsiLogicViT(model.parameters(), lr=1e-3, gamma_T_max=total_steps)

Recommended Hyperparameters

Task lr gamma chaos_tau gamma_T_max
Image classification 1e-3 0.05 0.3 0
NLP / Transformer fine-tuning 5e-4 0.03 0.2 total_steps
Audio classification 1e-3 0.05 0.3 0
Language modeling (from scratch) 3e-4 0.02 0.4 total_steps

Reproduce

git clone https://github.com/Troxter222/psilogic
cd psilogic
pip install -e ".[dev]"

# CIFAR-10 (10 seeds) + nanoGPT (5 seeds) on NVIDIA A40
python benchmark/benchmark_all.py

# Multi-Arena: BERT / ViT / GPT-2 vs AdamW vs Lion
python benchmark/benchmark_v3.py

License

MIT © 2025 Ali (Troxter222)


"Fire hard when wrong. Disappear when right."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psilogic-0.3.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

psilogic-0.3.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file psilogic-0.3.0.tar.gz.

File metadata

  • Download URL: psilogic-0.3.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for psilogic-0.3.0.tar.gz
Algorithm Hash digest
SHA256 aa84932b26efa0aa4100473bcc4f8f2c8d33503a810f8bcb158247e9b8937e11
MD5 3b175689d9a8924dae054ae555f437d1
BLAKE2b-256 3dfd335f4c8cf7ad180886fcdc389439fb376388690eee9e6d6678b8b67d0921

See more details on using hashes here.

File details

Details for the file psilogic-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: psilogic-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for psilogic-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a29b0d0f266cee11fdb3d0425f52e900d15e50ec45fbc6a5fed35abafbc1a8f3
MD5 6cc8de783712dba8424ebf1ea3d6e88f
BLAKE2b-256 0ffdee602628c8ed37895097de02500ad0f408cc13f537228f00218bd8730f46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page