Skip to main content

KOLMGformers: Unified KAN attention-free sequence modeling (KOLMOGformers) + Parallel Diffusion LM (OMGformers)

Project description

KOLMGformers v0.0.3

Unified Python library merging two research-grade model families:

Family Architecture Key property
KOLMOG Kolmogorov-Arnold Cumulative Context Attention-free, O(d) memory
OMG Parallel Diffusion Transformer Masked diffusion, full feature set

What's New in v0.0.3 — Bug Fixes & Improvements

Bug Fixes (KOLMOG)

ID Component Issue Fix
#K5 KOLMOGformerLayer phi received raw kappa_out and context — shape contract was implicit and fragile Documented and type-checked; context extractor now propagates attention_mask
#K6 generate() Repetition penalty used a Python loop over set(generated[b].tolist()) — O(seq·vocab) per step Vectorised with tensor.unique() + scatter_
#K7 generate() Top-p nucleus sampling: cumprobs − softmax(sorted) double-subtracted the pivot token, causing off-by-one exclusions Replaced with correct shifted-cumsum implementation
#K8 PLKANLayer breakpoints initialized via expand().clone() left non-contiguous memory; subtle autograd issues under in-place ops Replaced with linspace(...).repeat() → always contiguous
#K9 InnerKolmogorovFunction Always used slow B-spline KANLayer for φ layers, ignoring config.use_plkan build_kan_layer factory now honoured for φ too (~3–5× speedup with PLKAN)
#K10 CumulativeContextExtractor Causal pad+shift produced wrong exclusive prefix at position 0 (current token leaked into its own context) Replaced with correct exclusive cumsum: C^{<i} = cumsum[i] − kappa_w[i]
#K11 CumulativeContextExtractor attention_mask was accepted by model but never threaded to the context extractor; pad tokens polluted context vectors Mask is now applied to kappa before accumulation at every layer
#K12 KOLMOGformerForCausalLM Logits returned only [:, :-1, :] (already shifted), breaking downstream use of the full logit tensor Full-sequence logits returned; shift applied only inside loss computation (matches HF API)
#K13 save_pretrained Direct torch.save to final path could leave a corrupted checkpoint on interruption Atomic write via tempfile + os.replace; safetensors support added
#K14 KOLMOGformerModel No gradient checkpointing — OOM on long sequences during training enable_gradient_checkpointing() added; controlled via TrainingArguments.gradient_checkpointing
#K15 KANLayer.b_splines Grid buffer could be float32 while activations are bfloat16/float16, causing dtype mismatch Grid is cast to x.dtype on every forward pass

Bug Fixes (Training)

ID Component Issue Fix
#T1 Trainer No early stopping — training continued even after convergence early_stopping_patience added to TrainingArguments
#T2 DataCollatorForCausalLM Sequences silently truncated or accepted without warning warn_length parameter warns when batch sequences exceed model's position limit
#T3 Trainer._save_checkpoint Saved only model weights — optimizer/scheduler lost, training couldn't truly resume Optimizer + scheduler state saved in trainer_state.pt
#T4 Trainer.load_checkpoint Restored only model weights and step count Now restores optimizer, scheduler, early-stopping state
#T5 get_scheduler Missing "constant_with_warmup" type Added
#T6 Trainer bf16=True on CPU silently fell back to float32 with no warning Warning emitted; autocast errors caught gracefully
#T7 DataCollatorForMaskedLM Random-replacement tokens were drawn from [0, vocab_size) including [PAD]/[BOS]/[EOS] Now draws from [num_special_tokens, vocab_size)

New Features (v0.0.3)

KOLMOGformerConfig additions:

  • context_dropout — independent dropout on the context vector path (default 0.0).
  • ffn_type — FFN activation: "gelu" (default) | "silu" | "swiglu".
  • max_position_embeddings_dynamic — RoPE cache auto-extends beyond limit instead of erroring (default True).
  • validate() — called in __post_init__; surfaces config errors early with helpful messages.
  • __repr__ — readable summary of key config fields.

TrainingArguments additions:

  • early_stopping_patience — stop after N evaluations without improvement.
  • gradient_checkpointing — enable memory-efficient training automatically.

Installation

pip install -e .
# Optional extras
pip install -e ".[hf]"       # HuggingFace tokenizers
pip install -e ".[flash]"    # Flash Attention 2
pip install -e ".[all]"      # Everything

Quick Start

KOLMOG — Attention-Free Causal LM

from kolmgformers import KOLMOGformerConfig, KOLMOGformerForCausalLM
import torch

config = KOLMOGformerConfig(
    vocab_size=32000,
    hidden_size=512,
    num_channels=8,
    num_layers=6,
    causal=True,
    use_nce=True,    # Normalized Context Extraction
    use_wcc=True,    # Weighted Cumulative Context (v0.0.2+)
    use_plkan=True,  # Piecewise Linear KAN — 3-5x faster (v0.0.2+)
)
model = KOLMOGformerForCausalLM(config)
print(config)  # v0.0.3: readable repr

ids = torch.tensor([[1, 42, 100]])
out = model.generate(ids, max_new_tokens=50, temperature=0.8)

KOLMOG — Training with Early Stopping

from kolmgformers import (
    KOLMOGTrainer, KOLMOGTrainingArguments,
    KOLMOGDataCollatorForCausalLM,
)

args = KOLMOGTrainingArguments(
    output_dir="runs/my_run",
    num_train_epochs=10,
    early_stopping_patience=3,   # v0.0.3: stop after 3 bad evals
    gradient_checkpointing=True, # v0.0.3: save memory on long seqs
    evaluation_strategy="steps",
    eval_steps=500,
)
trainer = KOLMOGTrainer(
    model=model,
    args=args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    data_collator=KOLMOGDataCollatorForCausalLM(pad_token_id=0),
)
trainer.train()

KOLMOG — True Checkpoint Resume

# v0.0.3: optimizer + scheduler state is saved, enabling true resume
trainer.load_checkpoint("runs/my_run/checkpoint-5000")
trainer.train()  # continues from exact state

OMG — Diffusion LM

from kolmgformers import OMGConfig, OMGModel

config = OMGConfig(vocab_size=32000, hidden_size=768, num_layers=12)
model  = OMGModel(config)

import torch
prompt = torch.tensor([[1, 42]])
out = model.generate(prompt, new_tokens=128, steps=10)

Architecture: KOLMOG

Based on the Kolmogorov-Arnold representation theorem:

F(X) = Σ_q Φ_q( Σ_i φ_{q,i}( xᵢ ⊕ eᵢ ⊕ c_{q,i} ) )

Key innovations:

  • KAN layers — learnable B-spline activations per edge (not fixed non-linearities)
  • NCE — Normalized Context Extraction: jackknife leave-one-out mean context
  • WCC — Weighted Cumulative Context: attention-like token selectivity at O(n·d)
  • PLKAN — Piecewise Linear KAN: 3–5× faster than B-spline, same expressivity
  • No attention — O(n·d) time, O(d) memory (independent of sequence length)

Architecture: OMG

Parallel Diffusion Language Model with:

  • GQA / MLA / Sliding-Window / Linear / Block-Sparse attention
  • MoE (dense + soft MoE)
  • DS-PDLM dual-stream (understanding + generation)
  • LoRA / DoRA PEFT
  • TASA + MFS + DI efficiency trilogy
  • TWE temporal embeddings, NCA neuro-creative routing

Bug Fix History

Version Fixes
v0.0.1 #K1–#K4 (config, imports, KANLayer rightmost knot)
v0.0.2 WCC + PLKANLayer added
v0.0.3 #K5–#K15 (context mask, generate, PLKAN, phi layers, causal prefix, gradient checkpointing) + #T1–#T7 (early stopping, optimizer save/load, bf16 CPU, special token masking)

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kolmgformers-0.0.3.tar.gz (169.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kolmgformers-0.0.3-py3-none-any.whl (183.6 kB view details)

Uploaded Python 3

File details

Details for the file kolmgformers-0.0.3.tar.gz.

File metadata

  • Download URL: kolmgformers-0.0.3.tar.gz
  • Upload date:
  • Size: 169.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for kolmgformers-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ea6023aca6078f777baa572363e4a88eed64748765f57685b539db2da95bdc33
MD5 8c4bde66c17ad6f60c695af2f4616b42
BLAKE2b-256 5583ec0a30efaa69ad3a08987a6aad8aca52e9c46ecdc567b4eded55b7664f3b

See more details on using hashes here.

File details

Details for the file kolmgformers-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: kolmgformers-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 183.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for kolmgformers-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fe8b1acda5eb69de90c60a9347a4daf95e6aa1c30ad7ab98da253cc59e2bf2f8
MD5 1976a60eb84872c3c50755a3157e43d7
BLAKE2b-256 7ee3c1f795d6768127ac0fae5035c3cb90dba3bcc2dacf14c4ea440354652348

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page