FANoS-v2: feedback-controlled momentum optimizer for PyTorch
Project description
FANoS-v2
FANoS-v2 is a PyTorch optimizer for experiments with feedback-controlled momentum on stiff objectives. It is not a default replacement for AdamW. The goal of this implementation is consistency, stability instrumentation, and a clear path toward lower-memory variants.
Install
pip install fanos
For editable development:
python3 -m pip install virtualenv
python3 -m virtualenv fanos_env
source fanos_env/bin/activate
pip install -r requirements.txt
pip install -e .
The checked local environment uses Python 3.13.5 and PyTorch 2.8.0. TensorFlow is not required for the PyTorch optimizer core. Add it separately only for TensorFlow-specific experiments.
Quickstart
import torch
from fanos import FANoS
from fanos_v2 import FANoSV2, FANoSV2Fast
model = torch.nn.Linear(10, 1)
opt = FANoS(model.parameters(), lr=1e-3, grad_clip=1.0)
x = torch.randn(64, 10)
y = torch.randn(64, 1)
loss = torch.nn.functional.mse_loss(model(x), y)
loss.backward()
opt.step()
opt.zero_grad()
print(opt.diagnostics()[0])
For the current best general guardrails, use:
opt = FANoSV2(model.parameters(), lr=1e-3, grad_clip=1.0, preset="auto")
preset="auto" keeps the standard parameter-unit update, starts with low momentum, delays thermostat damping, and lets the RMS preconditioner soften only when the feedback controller sees unstable update energy. It is meant as a safer general preset, not a replacement for task-specific tuning.
Core Update
FANoS-v2 defaults to an update buffer u in parameter units:
pre_g = g / (sqrt(s) + eps)
rho = momentum * exp(-lr * zeta)
u = rho * u - lr * pre_g
theta = theta + u
The thermostat compares update energy with a target proposed-step energy and adjusts non-negative friction zeta using a clipped log-ratio controller.
The RMS preconditioner uses bias correction by default, which makes early steps much less brittle when beta2 is close to one.
For residual-heavy scientific objectives such as PINNs, use preconditioner_power < 1 or preconditioner_power=0 to avoid over-normalizing PDE and boundary-loss gradients.
The preset="auto" path keeps preconditioner_power=1.0 for ordinary training, but enables adaptive softening when the previous thermostat error is large. This avoided the sequence-memory stall in smoke tests while preserving normal image-classification startup.
For paper-equation audits, use:
opt = FANoSV2(model.parameters(), lr=1e-3, update_mode="physical")
That mode stores a descent velocity v and applies:
v = rho * v + pre_g
theta = theta - lr * v
The default update_mode="parameter" is recommended for public training because it removes the old theta += v versus theta += lr*v ambiguity.
See docs/math.md for the mathematical notes.
Efficiency Options
preconditioner="diag": full diagonal RMS state plus update buffer.preconditioner="factored": row/column second-moment factors for matrix-like tensors.preconditioner="none": feedback momentum without RMS preconditioning.state_dtype=torch.bfloat16: optional lower-precision optimizer state.adaptive_lr=True: optional gradient-stability learning-rate modulation withlr_bounds.
Use optimizer.state_size_bytes() to estimate tensor-state memory.
The package also exports experimental memory/communication helpers:
from fanos_v2 import (
low_rank_approximation,
quantize_4bit,
dequantize_4bit,
sparsify_topk,
densify_topk,
dynamic_variance_clip,
)
These are intended for benchmark and distributed-training experiments. They are deliberately separate from the optimizer step so convergence behavior stays auditable.
Examples
python examples/rosenbrock_demo.py
python tools/fetch_datasets.py --dataset mnist
python tools/fetch_datasets.py --dataset fashionmnist
python tools/fetch_datasets.py --dataset cifar10
python tools/fetch_datasets.py --dataset eegbci --subject 1 --runs 3 4
python tools/fetch_datasets.py --dataset eegbci --subject 2 --runs 3 4
python benchmarks/quadratic_compare.py --steps 500
python benchmarks/vision_benchmark.py --epochs 1 --train-samples 512 --test-samples 256
python benchmarks/vision_benchmark.py --dataset fashionmnist --epochs 1 --train-samples 1024 --test-samples 512 --optimizers fanosv2 adamw --fanos-preset auto
python benchmarks/eeg_eegbci_benchmark.py --train-subjects 1 --test-subject 2 --runs 3 4 --epochs 1
pytest
One-Command Benchmark Sweep
From fanos_v2_project:
./fanos_virtualenv/bin/python tools/run_all_benchmarks.py --profile full --device auto
This will fetch missing datasets into ../datasets, write CSVs/logs into ../results, and generate:
../reports/fanos_v2_benchmark_report.md
The default full profile runs:
- quadratic benchmark: 2048 dimensions, 2000 steps
- MNIST benchmark: 60,000 train samples, 10,000 test samples, 5 epochs
- EEGBCI benchmark: train subjects 1-4, test subject 5, runs 3 and 4, 10 epochs
For a faster check:
./fanos_virtualenv/bin/python tools/run_all_benchmarks.py --profile smoke
Full Research Run
This is the one-command runner for leaving the machine overnight. It can fetch datasets, run MNIST, FashionMNIST, CIFAR-10, stiff objectives, the PINN preset, optional EEG, and build reports.
./fanos_virtualenv/bin/python tools/run_full_research_study.py \
--blocks vision stiff pinn \
--vision-datasets mnist fashionmnist cifar10 \
--seeds 0 1 2 3 4 \
--configs low_lr auto stable vision_sweep_best \
--device mps \
--vision-epochs 5 \
--stiff-steps 2000 \
--results-root ../results/full_research_mps \
--report-root ../reports
Use --skip-download after the datasets are already present. Add eeg to --blocks if you also want the EEGBCI study in the same run.
For a quick command preview without running:
./fanos_virtualenv/bin/python tools/run_full_research_study.py --dry-run
Overnight Study
This is the better command for serious tuning evidence. It repeats seeds, compares baselines against several fixed FANoS presets, and writes aggregate mean/std tables:
./fanos_virtualenv/bin/python tools/run_night_study.py \
--tasks vision eeg \
--seeds 0 1 2 3 4 \
--configs low_lr auto stable vision_sweep_best eeg_sweep_best \
--device cpu \
--vision-dataset mnist \
--vision-epochs 5 \
--vision-train-samples 60000 \
--vision-test-samples 10000 \
--eeg-epochs 10
It writes:
../results/night_study/night_study_raw.csv
../results/night_study/night_study_summary.csv
../reports/fanos_night_study_report.md
For GPU or accelerator auto-detection:
./fanos_virtualenv/bin/python tools/run_all_benchmarks.py --profile full --device auto
For Apple Silicon, use --device mps or --device auto. In this checked Mac environment, PyTorch reports mps_built=True but mps_available=False, so the runners currently fall back to CPU. Verify with:
./fanos_virtualenv/bin/python - <<'PY'
import torch
print(torch.__version__)
print("mps built:", torch.backends.mps.is_built())
print("mps available:", torch.backends.mps.is_available())
PY
Use --skip-download to resume after datasets are already present.
For the current speed bottleneck, compare exact FANoS, fast-sync FANoS, and AdamW on the same small run:
bash tools/run_speed_check.sh mps
The fast-sync path uses --fanos-thermostat-interval 8, --fanos-grad-norm-interval 8, and --no-fanos-sanitize-gradients. Treat it as a performance candidate until its accuracy has been revalidated on repeated seeds.
For the real optimizer refactor path, compare the exact reference optimizer against the opt-in fanosv2fast class:
bash tools/run_fast_refactor_check.sh mps mnist
fanosv2fast keeps FANoSV2 untouched and uses faster training defaults: preset="auto", no adaptive LR, no gradient clipping, thermostat updates every 4 steps, and diagnostics off by default. Treat it as an experimental speed preset until it is validated outside the lightweight vision suite.
For optimizer experiments that intentionally remove gradient-norm scalar synchronization, pass --fanos-grad-clip 0 --no-fanos-adaptive-lr. This is an accuracy-risky speed test, not a recommended default.
Large benchmark targets such as ResNet-50, ViT-S, Llama-60m, HMC, and ADFTD should live in separate reproducible experiment configs with fixed seeds, exact datasets, hardware notes, and baseline sweeps. The current repository includes the optimizer core and lightweight sanity benchmarks only.
See docs/benchmarking.md for dataset and benchmark details.
Current Smoke Results
These are tiny CPU smoke runs, not claims of superiority.
MNIST subset, one epoch, 512 train samples, 256 test samples:
fanosv2 loss=2.6516 top1=0.129 time=0.13s state=0.808MiB
adamw loss=2.2398 top1=0.168 time=0.11s state=0.808MiB
sgd loss=2.2936 top1=0.137 time=0.11s state=0.404MiB
rmsprop loss=1.2470 top1=0.598 time=0.11s state=0.808MiB
EEGBCI train subject 1, test subject 2, one epoch:
fanosv2 loss=2.1143 top1=0.500 time=0.03s state=0.553MiB
adamw loss=0.7926 top1=0.500 time=0.01s state=0.553MiB
The 10-seed MNIST CPU study now shows low_lr FANoS ahead of AdamW on mean top-1, but slower per run:
FANoS low_lr top1_mean=0.9899 seconds_mean=70.9
AdamW baseline top1_mean=0.9879 seconds_mean=65.0
RMSProp baseline top1_mean=0.9817 seconds_mean=65.0
SGD baseline top1_mean=0.9675 seconds_mean=63.2
Critical interpretation: this is a real positive signal on MNIST, not proof of a universal optimizer. FANoS-v2 is strongest today on Rosenbrock/stiff nonconvex tests, competitive on MNIST after tuning, repaired on the sequence-memory smoke with warmup, and promising for PINNs only with the softer pinn preset. EEGBCI and ill-conditioned quadratics remain weak or inconclusive.
Reproducibility Checklist
- Set random seeds in each experiment.
- Report learning-rate sweeps, not only the best run.
- Log
zeta,rho, update energy, target energy, gradient norm, and clip scale. - Compare against AdamW with gradient clipping for serious claims.
- Report wall-clock time, peak memory, and energy-to-target when hardware counters are available.
- For EEG tasks such as HMC or ADFTD, report dataset split protocol, preprocessing, model architecture, and seed-level confidence intervals.
Contributing
See CONTRIBUTING.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fanos-0.4.0.tar.gz.
File metadata
- Download URL: fanos-0.4.0.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62980e2dfbfbd9f07b45f65fc2c933a4e32b79ddde9230b39b2fcf0d758f80a8
|
|
| MD5 |
feea3083f52b5f56f51891294e6f58ff
|
|
| BLAKE2b-256 |
adebfe275a2966d2e3c2ca72617b8c3ac64f20005d2f67cc3f41c025c70deee6
|
File details
Details for the file fanos-0.4.0-py3-none-any.whl.
File metadata
- Download URL: fanos-0.4.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5af3af862e698c2f0cae1ef27b2227ff972760ff28786821981d0bdef4db25f8
|
|
| MD5 |
e5c9a17c4f748444683ec9f557d9e4d8
|
|
| BLAKE2b-256 |
cdf59739372187b178d83c31204471d85054e47281a618689e40f22af265b70b
|