KOLMGformers: Unified KAN attention-free sequence modeling (KOLMOGformers) + Parallel Diffusion LM (OMGformers)

These details have not been verified by PyPI

Project description

KOLMGformers v0.0.3

Unified Python library merging two research-grade model families:

Family	Architecture	Key property
KOLMOG	Kolmogorov-Arnold Cumulative Context	Attention-free, O(d) memory
OMG	Parallel Diffusion Transformer	Masked diffusion, full feature set

What's New in v0.0.3 — Bug Fixes & Improvements

Bug Fixes (KOLMOG)

ID	Component	Issue	Fix
#K5	`KOLMOGformerLayer`	`phi` received raw `kappa_out` and `context` — shape contract was implicit and fragile	Documented and type-checked; context extractor now propagates `attention_mask`
#K6	`generate()`	Repetition penalty used a Python loop over `set(generated[b].tolist())` — O(seq·vocab) per step	Vectorised with `tensor.unique()` + `scatter_`
#K7	`generate()`	Top-p nucleus sampling: `cumprobs − softmax(sorted)` double-subtracted the pivot token, causing off-by-one exclusions	Replaced with correct shifted-cumsum implementation
#K8	`PLKANLayer`	`breakpoints` initialized via `expand().clone()` left non-contiguous memory; subtle autograd issues under in-place ops	Replaced with `linspace(...).repeat()` → always contiguous
#K9	`InnerKolmogorovFunction`	Always used slow B-spline `KANLayer` for φ layers, ignoring `config.use_plkan`	`build_kan_layer` factory now honoured for φ too (~3–5× speedup with PLKAN)
#K10	`CumulativeContextExtractor`	Causal pad+shift produced wrong exclusive prefix at position 0 (current token leaked into its own context)	Replaced with correct exclusive cumsum: `C^{<i} = cumsum[i] − kappa_w[i]`
#K11	`CumulativeContextExtractor`	`attention_mask` was accepted by model but never threaded to the context extractor; pad tokens polluted context vectors	Mask is now applied to `kappa` before accumulation at every layer
#K12	`KOLMOGformerForCausalLM`	Logits returned only `[:, :-1, :]` (already shifted), breaking downstream use of the full logit tensor	Full-sequence logits returned; shift applied only inside loss computation (matches HF API)
#K13	`save_pretrained`	Direct `torch.save` to final path could leave a corrupted checkpoint on interruption	Atomic write via `tempfile` + `os.replace`; safetensors support added
#K14	`KOLMOGformerModel`	No gradient checkpointing — OOM on long sequences during training	`enable_gradient_checkpointing()` added; controlled via `TrainingArguments.gradient_checkpointing`
#K15	`KANLayer.b_splines`	Grid buffer could be float32 while activations are bfloat16/float16, causing dtype mismatch	Grid is cast to `x.dtype` on every forward pass

Bug Fixes (Training)

ID	Component	Issue	Fix
#T1	`Trainer`	No early stopping — training continued even after convergence	`early_stopping_patience` added to `TrainingArguments`
#T2	`DataCollatorForCausalLM`	Sequences silently truncated or accepted without warning	`warn_length` parameter warns when batch sequences exceed model's position limit
#T3	`Trainer._save_checkpoint`	Saved only model weights — optimizer/scheduler lost, training couldn't truly resume	Optimizer + scheduler state saved in `trainer_state.pt`
#T4	`Trainer.load_checkpoint`	Restored only model weights and step count	Now restores optimizer, scheduler, early-stopping state
#T5	`get_scheduler`	Missing `"constant_with_warmup"` type	Added
#T6	`Trainer`	`bf16=True` on CPU silently fell back to float32 with no warning	Warning emitted; autocast errors caught gracefully
#T7	`DataCollatorForMaskedLM`	Random-replacement tokens were drawn from `[0, vocab_size)` including `[PAD]`/`[BOS]`/`[EOS]`	Now draws from `[num_special_tokens, vocab_size)`

New Features (v0.0.3)

KOLMOGformerConfig additions:

context_dropout — independent dropout on the context vector path (default 0.0).
ffn_type — FFN activation: "gelu" (default) | "silu" | "swiglu".
max_position_embeddings_dynamic — RoPE cache auto-extends beyond limit instead of erroring (default True).
validate() — called in __post_init__; surfaces config errors early with helpful messages.
__repr__ — readable summary of key config fields.

TrainingArguments additions:

early_stopping_patience — stop after N evaluations without improvement.
gradient_checkpointing — enable memory-efficient training automatically.

Installation

pip install -e .
# Optional extras
pip install -e ".[hf]"       # HuggingFace tokenizers
pip install -e ".[flash]"    # Flash Attention 2
pip install -e ".[all]"      # Everything

Quick Start

KOLMOG — Attention-Free Causal LM

from kolmgformers import KOLMOGformerConfig, KOLMOGformerForCausalLM
import torch

config = KOLMOGformerConfig(
    vocab_size=32000,
    hidden_size=512,
    num_channels=8,
    num_layers=6,
    causal=True,
    use_nce=True,    # Normalized Context Extraction
    use_wcc=True,    # Weighted Cumulative Context (v0.0.2+)
    use_plkan=True,  # Piecewise Linear KAN — 3-5x faster (v0.0.2+)
)
model = KOLMOGformerForCausalLM(config)
print(config)  # v0.0.3: readable repr

ids = torch.tensor([[1, 42, 100]])
out = model.generate(ids, max_new_tokens=50, temperature=0.8)

KOLMOG — Training with Early Stopping

from kolmgformers import (
    KOLMOGTrainer, KOLMOGTrainingArguments,
    KOLMOGDataCollatorForCausalLM,
)

args = KOLMOGTrainingArguments(
    output_dir="runs/my_run",
    num_train_epochs=10,
    early_stopping_patience=3,   # v0.0.3: stop after 3 bad evals
    gradient_checkpointing=True, # v0.0.3: save memory on long seqs
    evaluation_strategy="steps",
    eval_steps=500,
)
trainer = KOLMOGTrainer(
    model=model,
    args=args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    data_collator=KOLMOGDataCollatorForCausalLM(pad_token_id=0),
)
trainer.train()

KOLMOG — True Checkpoint Resume

# v0.0.3: optimizer + scheduler state is saved, enabling true resume
trainer.load_checkpoint("runs/my_run/checkpoint-5000")
trainer.train()  # continues from exact state

OMG — Diffusion LM

from kolmgformers import OMGConfig, OMGModel

config = OMGConfig(vocab_size=32000, hidden_size=768, num_layers=12)
model  = OMGModel(config)

import torch
prompt = torch.tensor([[1, 42]])
out = model.generate(prompt, new_tokens=128, steps=10)

Architecture: KOLMOG

Based on the Kolmogorov-Arnold representation theorem:

F(X) = Σ_q Φ_q( Σ_i φ_{q,i}( xᵢ ⊕ eᵢ ⊕ c_{q,i} ) )

Key innovations:

KAN layers — learnable B-spline activations per edge (not fixed non-linearities)
NCE — Normalized Context Extraction: jackknife leave-one-out mean context
WCC — Weighted Cumulative Context: attention-like token selectivity at O(n·d)
PLKAN — Piecewise Linear KAN: 3–5× faster than B-spline, same expressivity
No attention — O(n·d) time, O(d) memory (independent of sequence length)

Architecture: OMG

Parallel Diffusion Language Model with:

GQA / MLA / Sliding-Window / Linear / Block-Sparse attention
MoE (dense + soft MoE)
DS-PDLM dual-stream (understanding + generation)
LoRA / DoRA PEFT
TASA + MFS + DI efficiency trilogy
TWE temporal embeddings, NCA neuro-creative routing

Bug Fix History

Version	Fixes
v0.0.1	#K1–#K4 (config, imports, KANLayer rightmost knot)
v0.0.2	WCC + PLKANLayer added
v0.0.3	#K5–#K15 (context mask, generate, PLKAN, phi layers, causal prefix, gradient checkpointing) + #T1–#T7 (early stopping, optimizer save/load, bf16 CPU, special token masking)

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.3

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kolmgformers-0.0.3.tar.gz (169.3 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kolmgformers-0.0.3-py3-none-any.whl (183.6 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file kolmgformers-0.0.3.tar.gz.

File metadata

Download URL: kolmgformers-0.0.3.tar.gz
Upload date: May 15, 2026
Size: 169.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for kolmgformers-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`ea6023aca6078f777baa572363e4a88eed64748765f57685b539db2da95bdc33`
MD5	`8c4bde66c17ad6f60c695af2f4616b42`
BLAKE2b-256	`5583ec0a30efaa69ad3a08987a6aad8aca52e9c46ecdc567b4eded55b7664f3b`

See more details on using hashes here.

File details

Details for the file kolmgformers-0.0.3-py3-none-any.whl.

File metadata

Download URL: kolmgformers-0.0.3-py3-none-any.whl
Upload date: May 15, 2026
Size: 183.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for kolmgformers-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe8b1acda5eb69de90c60a9347a4daf95e6aa1c30ad7ab98da253cc59e2bf2f8`
MD5	`1976a60eb84872c3c50755a3157e43d7`
BLAKE2b-256	`7ee3c1f795d6768127ac0fae5035c3cb90dba3bcc2dacf14c4ea440354652348`

See more details on using hashes here.

kolmgformers 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

KOLMGformers v0.0.3

What's New in v0.0.3 — Bug Fixes & Improvements

Bug Fixes (KOLMOG)

Bug Fixes (Training)

New Features (v0.0.3)

Installation

Quick Start

KOLMOG — Attention-Free Causal LM

KOLMOG — Training with Early Stopping

KOLMOG — True Checkpoint Resume

OMG — Diffusion LM

Architecture: KOLMOG

Architecture: OMG

Bug Fix History

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes