Skip to main content

Kourkoutas-β optimiser and demo workloads

Project description

CI (macOS arm64)

kbeta – Kourkoutas‑β Optimiser   🌞🦎🚀📈

Reference implementation of Kourkoutas‑β: A Sunspike‑Driven Adam Optimizer with Desert Flair Published as arXiv:2508.12996.

This repository provides the optimiser implementation together with example workloads for reproducibility.


Table of Contents

  1. Key ideas
  2. Project layout
  3. Quick start
  4. Installation
  5. Using Kourkoutas‑β in your own model
  6. Example workloads
  7. Dataset and creation (verifiable)
  8. Model and training protocol
  9. Optimizers and settings
  10. Companion repositories
  11. Tests & linting
  12. Citation
  13. License
  14. Contributing & roadmap

Key ideas

  • Layer‑wise dynamic β₂ driven by a bounded sun‑spike signal (gradient norm vs. EMA).
  • Two β₂ parameters: β₂_min for agility under spikes, β₂_max for stability when calm.
  • Optional features: soft‑max AMSGrad, trust‑region clipping, adaptive tiny term.
  • Drop‑in compatibility: recovers exact Adam when dynamic β₂ and extras are disabled.
  • 100 % Apple MLX compatible – no PyTorch required.

See the paper for derivations, experiments, and theoretical analysis.


Conceptual overview

High‑level intuition – the “desert lizard” view

Kourkoutas‑β is an Adam‑style optimiser whose second‑moment decay β₂ is no longer a hard‑wired constant. Instead, every update computes a sun‑spike score—a single, cheap scalar that compares the current gradient magnitude to its exponentially‑weighted history. We then map that score to β₂ on the fly:

Sun‑spike Lizard metaphor Adaptive behaviour
High The desert sun is scorching — the lizard is “fully warmed up” and sprints. Lower β₂ toward β₂,min → second‑moment memory shortens, allowing rapid, large parameter moves.
Low It’s cool; the lizard feels sluggish and takes cautious steps. Raise β₂ toward β₂,max → longer memory, filtering noise and producing steadier updates.

Because the sun‑spike diagnostic exists only in Kourkoutas‑β, the method can be viewed as Adam with a temperature‑controlled β₂ schedule: warm gradients trigger exploration; cooler gradients favour exploitation and stability.


Project layout

kbeta
├── src/kbeta/                   # pip package
│   ├── __init__.py              # exports KourkoutasBeta / KourkoutasSoftmaxFlex
│   └── optim/
│       └── kbeta_softmax.py     # implementation
│
├── examples/
│   └── transformer_char_lm/     # Testbed D: character‑level LM on small‑enwik8
│
├── tests/                       # pytest suite (smoke + ablation tests)
├── assets/                      # logo and figures
├── pyproject.toml
└── README.md                    # you are here

Quick start

# 1. clone your fork
git clone git@github.com:<YOUR-USERNAME>/kbeta.git
cd kbeta

# 2. create a fresh virtualenv
python -m venv .venv && source .venv/bin/activate

# 3. editable install + dev extras
pip install -e ".[dev]"

# 4. run the smoke + ablation tests
pytest -q

Installation

Option 1: PyPI wheels (end-users)

If you only want the optimiser in your own MLX projects, install from PyPI:

pip install kbeta

This gives you just the kbeta package with the latest MLX.

For development tools and examples:

pip install "kbeta[dev]"

For exact reproducibility of the paper results (MLX 0.26.3, Adam-95/999 baselines):

pip install "kbeta[repro]"

Option 2: Cloning the repo (researchers / contributors)

If you want to run the example workloads or contribute to development, clone the repo:

git clone https://github.com/sck-at-ucy/kbeta.git
cd kbeta
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

This installs the package in editable mode and makes all example scripts available.


Minimal example

import time
import mlx.core as mx
import mlx.nn as nn
from kbeta import KourkoutasBeta

num_features, num_examples, num_iters, lr = 100, 1000, 1000, 0.01

# True parameters and data
w_star = mx.random.normal((num_features,))
X = mx.random.normal((num_examples, num_features))
y = X @ w_star + 1e-2 * mx.random.normal((num_examples,))

# Simple model with one parameter
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.w = mx.zeros((num_features,))

    def __call__(self, x):
        return x @ self.w

model = Model()

def loss_fn(m):
    return 0.5 * mx.mean(mx.square(m(X) - y))

opt = KourkoutasBeta(learning_rate=lr)
opt.init(model.parameters())

grad_fn = nn.value_and_grad(loss_fn)

tic = time.time()
for _ in range(num_iters):
    loss, grads = grad_fn(model)
    opt.update(model, grads)
    mx.eval(model.parameters())
toc = time.time()

error_norm = float(mx.linalg.norm(model.w - w_star))
print(f"Loss={loss.item():.5f}, L2|w-w*|={error_norm:.5f},
      Throughput={num_iters/(toc-tic):.1f} it/s")

Example workloads

Important: 👉 👉 The 2‑D Transformer (Heat2D, Testbed A) and 3‑D PINN (Heat3D, Testbed B) of the paper are released as separate repositories:

This repo includes the Transformer – Testbed D (Char-level LM on small-enwik8)

Folder Paper section What it shows How to run
examples/transformer_char_lm § 6.4 (Testbed D) Character‑level LM on small‑enwik8 python examples/transformer_char_lm/testbed_d.py --text ./data/small_enwik8.txt --opt kbeta

Running Transformer – Testbed D (Char-level LM on small-enwik8)

All commands assume running from the repo root (adjust accordingly) 👉 Make sure you have generated ./data/small-enwik8.txt and the ./logs_enwi directory as described below.

Run the Transformer training with the same options used in the paper (adapted to the repo paths):

  python -u src/kbeta/examples/transformer_char_lm/testbed_d.py --text ./data/small-enwik8.txt     --steps 50001 --batch 4 --d_model 512 --n_layer 6 --n_head 8     --ctx 512 --lmin 16 --lmax 512 --warmup 250 --opt kbeta --adam_beta2 0.95     --layer_bucket per-array --barrier_every 100 --eval_every 500     --lr 1e-3     --seed 0 --fixed_eval_seed 1234 --deterministic --compile     --wd 0.0 --lr_schedule "1:1e-3,30000:5e-4,40000:1e-4,60000:1e-5"     2>&1 | tee "logs_enwik/kbeta_seed0.log"

This reproduces a run that mirros the testbed reported in the paper with full logging under logs_enwik/.


Dataset and creation (verifiable)

We use the first 30 MB of enwik8 (the classic Hutter Prize corpus). The slice is created deterministically:

curl -L -o enwik8.zip https://data.deepai.org/enwik8.zip
unzip enwik8.zip
head -c 30000000 enwik8 > small-enwik8.txt
mkdir -p data && mv small-enwik8.txt data/
mkdir ./logs_enwik

Checksums on our machine:

sha256sum enwik8
# 2b49720e...c024a8

sha256sum data/small-enwik8.txt
# e0152eee...298b7

Re-creating small-enwik8.txt reproduced the same SHA‑256 (bit‑for‑bit identity).


Model and training protocol

As in the provided script, we train:

  • Architecture: 6‑block Transformer (d_model=512, n_head=8, FFN width = 4d) GELU, LayerNorm, causal self‑attention; no dropout or weight decay.
  • Data schedule: variable sequence length with deterministic bucketing (L \in [16,512]), rounded to multiples of 32; batch = 4; context window = 512.
  • Steps: 50,001
  • Learning rate schedule:
    • 1e‑3 for steps 1 ≤ s < 30k
    • 5e‑4 for 30k ≤ s < 40k
    • 1e‑4 for 40k ≤ s ≤ 50k
  • Evaluation: fixed held‑out batch (length = 256, B = 128) reporting cross‑entropy and BPC.
  • Runs: 10 matched seeds (0–9).

Optimizers and settings

  • Kourkoutas‑β (ours): β₁=0.9; dynamic β₂∈[0.88,0.999]; α=0.93 (EMA for sunspike); ε=1e‑8; warm‑up=250 steps; bias_correction="beta2max"; per‑array stable buckets; no AMSGrad/clip/adaptive‑tiny; diagnostics off.

  • Adam‑95: MLX Adam (β₁=0.9, β₂=0.95, ε=1e‑8), bias correction on.

  • Adam‑999: MLX Adam (β₁=0.9, β₂=0.999, ε=1e‑8), bias correction on.


Companion repositories

This repository hosts the core optimizer implementation and the char-level Transformer example (Testbed D).

Other workloads from the paper are available in dedicated repositories:

These companion repos share the same optimizer API and training protocol, so you can directly apply KourkoutasBeta with no code changes.


Tests & linting

pytest                 # unit & ablation tests
ruff check .           # style / imports / naming
pre-commit run --all   # run all hooks (if installed)

Continuous Integration (CI) runs these checks automatically.


Citation

If you use this code or method in your research, please cite:

@article{Kassinos2025Kourkoutas,
  title   = {Kourkoutas-β: A Sunspike-Driven Adam Optimizer with Desert Flair},
  author  = {Stavros Kassinos},
  journal = {arXiv preprint arXiv:2508.12996},
  year    = {2025},
  url     = {http://arxiv.org/abs/2508.12996}
}

License

This work is distributed under the MIT License—see LICENSE for details.


Contributing & roadmap

We welcome issues & PRs!

Planned milestones:

  1. v0.1.0 – optimiser + char‑LM demo (public).
  2. v0.2.0 – PDE workloads migrated to their own repos.
  3. v1.0.0 – journal publication, pip wheels for macOS/Apple Silicon & Linux.

Happy sprinting in the (numerical) desert 🌞🦎🚀📈

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kbeta-1.0.1a0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kbeta-1.0.1a0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file kbeta-1.0.1a0.tar.gz.

File metadata

  • Download URL: kbeta-1.0.1a0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for kbeta-1.0.1a0.tar.gz
Algorithm Hash digest
SHA256 b463fc179bb632a90d9498a76829585279de2324860fecd986cdadd6cdd10481
MD5 2eda470eba2fea46de276eb16176f16e
BLAKE2b-256 1e6d945459c55a4ed65fe21c5814c8623d4749f178741c430892d9736a5a2ca6

See more details on using hashes here.

File details

Details for the file kbeta-1.0.1a0-py3-none-any.whl.

File metadata

  • Download URL: kbeta-1.0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for kbeta-1.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f51f1e031a24dea6f34d54e91f422d74895ed855373608cbdc1cfaff5286b29
MD5 c4fe1cf2dee5e517f2436f36754d7439
BLAKE2b-256 42df00e4f3480810b08125085cbc83a837dbd810fc7abfd173c26941269d996b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page