KOLMGformers: Unified KAN attention-free sequence modeling (KOLMOGformers) + Parallel Diffusion LM (OMGformers)
Project description
KOLMGformers v0.0.3
Unified Python library merging two research-grade model families:
| Family | Architecture | Key property |
|---|---|---|
| KOLMOG | Kolmogorov-Arnold Cumulative Context | Attention-free, O(d) memory |
| OMG | Parallel Diffusion Transformer | Masked diffusion, full feature set |
What's New in v0.0.3 — Bug Fixes & Improvements
Bug Fixes (KOLMOG)
| ID | Component | Issue | Fix |
|---|---|---|---|
| #K5 | KOLMOGformerLayer |
phi received raw kappa_out and context — shape contract was implicit and fragile |
Documented and type-checked; context extractor now propagates attention_mask |
| #K6 | generate() |
Repetition penalty used a Python loop over set(generated[b].tolist()) — O(seq·vocab) per step |
Vectorised with tensor.unique() + scatter_ |
| #K7 | generate() |
Top-p nucleus sampling: cumprobs − softmax(sorted) double-subtracted the pivot token, causing off-by-one exclusions |
Replaced with correct shifted-cumsum implementation |
| #K8 | PLKANLayer |
breakpoints initialized via expand().clone() left non-contiguous memory; subtle autograd issues under in-place ops |
Replaced with linspace(...).repeat() → always contiguous |
| #K9 | InnerKolmogorovFunction |
Always used slow B-spline KANLayer for φ layers, ignoring config.use_plkan |
build_kan_layer factory now honoured for φ too (~3–5× speedup with PLKAN) |
| #K10 | CumulativeContextExtractor |
Causal pad+shift produced wrong exclusive prefix at position 0 (current token leaked into its own context) | Replaced with correct exclusive cumsum: C^{<i} = cumsum[i] − kappa_w[i] |
| #K11 | CumulativeContextExtractor |
attention_mask was accepted by model but never threaded to the context extractor; pad tokens polluted context vectors |
Mask is now applied to kappa before accumulation at every layer |
| #K12 | KOLMOGformerForCausalLM |
Logits returned only [:, :-1, :] (already shifted), breaking downstream use of the full logit tensor |
Full-sequence logits returned; shift applied only inside loss computation (matches HF API) |
| #K13 | save_pretrained |
Direct torch.save to final path could leave a corrupted checkpoint on interruption |
Atomic write via tempfile + os.replace; safetensors support added |
| #K14 | KOLMOGformerModel |
No gradient checkpointing — OOM on long sequences during training | enable_gradient_checkpointing() added; controlled via TrainingArguments.gradient_checkpointing |
| #K15 | KANLayer.b_splines |
Grid buffer could be float32 while activations are bfloat16/float16, causing dtype mismatch | Grid is cast to x.dtype on every forward pass |
Bug Fixes (Training)
| ID | Component | Issue | Fix |
|---|---|---|---|
| #T1 | Trainer |
No early stopping — training continued even after convergence | early_stopping_patience added to TrainingArguments |
| #T2 | DataCollatorForCausalLM |
Sequences silently truncated or accepted without warning | warn_length parameter warns when batch sequences exceed model's position limit |
| #T3 | Trainer._save_checkpoint |
Saved only model weights — optimizer/scheduler lost, training couldn't truly resume | Optimizer + scheduler state saved in trainer_state.pt |
| #T4 | Trainer.load_checkpoint |
Restored only model weights and step count | Now restores optimizer, scheduler, early-stopping state |
| #T5 | get_scheduler |
Missing "constant_with_warmup" type |
Added |
| #T6 | Trainer |
bf16=True on CPU silently fell back to float32 with no warning |
Warning emitted; autocast errors caught gracefully |
| #T7 | DataCollatorForMaskedLM |
Random-replacement tokens were drawn from [0, vocab_size) including [PAD]/[BOS]/[EOS] |
Now draws from [num_special_tokens, vocab_size) |
New Features (v0.0.3)
KOLMOGformerConfig additions:
context_dropout— independent dropout on the context vector path (default0.0).ffn_type— FFN activation:"gelu"(default) |"silu"|"swiglu".max_position_embeddings_dynamic— RoPE cache auto-extends beyond limit instead of erroring (defaultTrue).validate()— called in__post_init__; surfaces config errors early with helpful messages.__repr__— readable summary of key config fields.
TrainingArguments additions:
early_stopping_patience— stop after N evaluations without improvement.gradient_checkpointing— enable memory-efficient training automatically.
Installation
pip install -e .
# Optional extras
pip install -e ".[hf]" # HuggingFace tokenizers
pip install -e ".[flash]" # Flash Attention 2
pip install -e ".[all]" # Everything
Quick Start
KOLMOG — Attention-Free Causal LM
from kolmgformers import KOLMOGformerConfig, KOLMOGformerForCausalLM
import torch
config = KOLMOGformerConfig(
vocab_size=32000,
hidden_size=512,
num_channels=8,
num_layers=6,
causal=True,
use_nce=True, # Normalized Context Extraction
use_wcc=True, # Weighted Cumulative Context (v0.0.2+)
use_plkan=True, # Piecewise Linear KAN — 3-5x faster (v0.0.2+)
)
model = KOLMOGformerForCausalLM(config)
print(config) # v0.0.3: readable repr
ids = torch.tensor([[1, 42, 100]])
out = model.generate(ids, max_new_tokens=50, temperature=0.8)
KOLMOG — Training with Early Stopping
from kolmgformers import (
KOLMOGTrainer, KOLMOGTrainingArguments,
KOLMOGDataCollatorForCausalLM,
)
args = KOLMOGTrainingArguments(
output_dir="runs/my_run",
num_train_epochs=10,
early_stopping_patience=3, # v0.0.3: stop after 3 bad evals
gradient_checkpointing=True, # v0.0.3: save memory on long seqs
evaluation_strategy="steps",
eval_steps=500,
)
trainer = KOLMOGTrainer(
model=model,
args=args,
train_dataset=train_ds,
eval_dataset=val_ds,
data_collator=KOLMOGDataCollatorForCausalLM(pad_token_id=0),
)
trainer.train()
KOLMOG — True Checkpoint Resume
# v0.0.3: optimizer + scheduler state is saved, enabling true resume
trainer.load_checkpoint("runs/my_run/checkpoint-5000")
trainer.train() # continues from exact state
OMG — Diffusion LM
from kolmgformers import OMGConfig, OMGModel
config = OMGConfig(vocab_size=32000, hidden_size=768, num_layers=12)
model = OMGModel(config)
import torch
prompt = torch.tensor([[1, 42]])
out = model.generate(prompt, new_tokens=128, steps=10)
Architecture: KOLMOG
Based on the Kolmogorov-Arnold representation theorem:
F(X) = Σ_q Φ_q( Σ_i φ_{q,i}( xᵢ ⊕ eᵢ ⊕ c_{q,i} ) )
Key innovations:
- KAN layers — learnable B-spline activations per edge (not fixed non-linearities)
- NCE — Normalized Context Extraction: jackknife leave-one-out mean context
- WCC — Weighted Cumulative Context: attention-like token selectivity at O(n·d)
- PLKAN — Piecewise Linear KAN: 3–5× faster than B-spline, same expressivity
- No attention — O(n·d) time, O(d) memory (independent of sequence length)
Architecture: OMG
Parallel Diffusion Language Model with:
- GQA / MLA / Sliding-Window / Linear / Block-Sparse attention
- MoE (dense + soft MoE)
- DS-PDLM dual-stream (understanding + generation)
- LoRA / DoRA PEFT
- TASA + MFS + DI efficiency trilogy
- TWE temporal embeddings, NCA neuro-creative routing
Bug Fix History
| Version | Fixes |
|---|---|
| v0.0.1 | #K1–#K4 (config, imports, KANLayer rightmost knot) |
| v0.0.2 | WCC + PLKANLayer added |
| v0.0.3 | #K5–#K15 (context mask, generate, PLKAN, phi layers, causal prefix, gradient checkpointing) + #T1–#T7 (early stopping, optimizer save/load, bf16 CPU, special token masking) |
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kolmgformers-0.0.3.tar.gz.
File metadata
- Download URL: kolmgformers-0.0.3.tar.gz
- Upload date:
- Size: 169.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea6023aca6078f777baa572363e4a88eed64748765f57685b539db2da95bdc33
|
|
| MD5 |
8c4bde66c17ad6f60c695af2f4616b42
|
|
| BLAKE2b-256 |
5583ec0a30efaa69ad3a08987a6aad8aca52e9c46ecdc567b4eded55b7664f3b
|
File details
Details for the file kolmgformers-0.0.3-py3-none-any.whl.
File metadata
- Download URL: kolmgformers-0.0.3-py3-none-any.whl
- Upload date:
- Size: 183.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe8b1acda5eb69de90c60a9347a4daf95e6aa1c30ad7ab98da253cc59e2bf2f8
|
|
| MD5 |
1976a60eb84872c3c50755a3157e43d7
|
|
| BLAKE2b-256 |
7ee3c1f795d6768127ac0fae5035c3cb90dba3bcc2dacf14c4ea440354652348
|