Skip to main content

KOLMG-LoRA: Kolmogorov-Arnold Low-Rank Adaptation — a non-linear LoRA variant for kolmgformers and any PyTorch model

Project description

kolmg-lora

KOLMG-LoRA: Kolmogorov-Arnold Low-Rank Adaptation — a fundamentally new LoRA variant that replaces the standard linear bottleneck (B×A) with a two-layer PL-KAN (Piecewise-Linear Kolmogorov-Arnold Network), giving non-linear adaptation capacity at a parameter cost similar to standard LoRA.

Designed as the native fine-tuning method for the kolmgformers package, but works with any PyTorch nn.Module including HuggingFace transformers.


Install

pip install kolmg-lora                       # PyTorch only
pip install kolmg-lora[kolmgformers]         # + kolmgformers integration
pip install kolmg-lora[safetensors]          # + fast weight I/O
pip install kolmg-lora[all]                  # everything

What makes KOLMG-LoRA different?

Variant Adapter path Non-linear Scaling
LoRA B×A (linear) No α/r
rsLoRA B×A (linear) No α/√r
LoRA+ B×A (split LR) No α/r
DoRA B×A + magnitude No α/r
QLoRA B×A on 4-bit base No α/r
KOLMG-LoRA KAN bottleneck Yes ✓ α/r (or α/√r)
KOLMG-DoRA KAN + magnitude Yes ✓ α/r (or α/√r)

Standard LoRA learns: ΔW·x = B × A × x — always linear.

KOLMG-LoRA learns: ΔW·x = φ_out(φ_in(x)) where each φ is a mini KAN:

φ(x) = SiLU(x)·W_base  +  Σ_k  c_k · B_k(x)

B_k are B-spline basis functions on a uniform grid — each rank dimension gets its own learned activation shape, something no linear bottleneck can express at any rank.


Quick start

from kolmg_lora import KOLMGLoRAConfig, add_kolmg_lora, merge_lora, save_lora, load_lora

# 1. Configure
cfg = KOLMGLoRAConfig(
    rank       = 16,
    alpha      = 32.0,
    grid_size  = 4,      # KAN expressiveness knob (3–8 recommended)
    dropout    = 0.05,
)

# 2. Apply to any model
model = add_kolmg_lora(model, cfg)
# [kolmg-lora KOLMG-LoRA] 4 layers wrapped | rank=16 | grid=4 | order=1 | trainable: ...

# 3. Train normally — only KAN parameters update
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4
)

# 4. Save adapter (~few MB)
save_lora(model, "./my_adapter")

# 5. Merge for deployment (zero inference overhead)
model = merge_lora(model)

With kolmgformers

from kolmgformers import KOLMOGformerForCausalLM, KOLMOGformerConfig
from kolmg_lora import KOLMGLoRAConfig, add_kolmg_lora

model = KOLMOGformerForCausalLM(KOLMOGformerConfig(
    vocab_size=32000, hidden_size=512, num_channels=8, num_layers=6
))

cfg   = KOLMGLoRAConfig(rank=16, alpha=32.0, train_ffn=True)
model = add_kolmg_lora(model, cfg)

With HuggingFace transformers

from transformers import AutoModelForCausalLM
from kolmg_lora import add_kolmg_lora

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
model = add_kolmg_lora(model, rank=16, alpha=32.0)

Combining with other techniques

All combinations stack cleanly:

# KOLMG-LoRA + rsLoRA (rank-stabilised scaling α/√r)
cfg = KOLMGLoRAConfig(rank=16, rs_lora=True)

# KOLMG-LoRA + DoRA (magnitude decomposition)
cfg = KOLMGLoRAConfig(rank=16, use_dora=True)

# KOLMG-LoRA + LoRA+ (higher LR for φ_out)
cfg = KOLMGLoRAConfig(rank=16, lora_plus_ratio=16.0)

# LoRA+ with per-layer param groups
groups = []
for m in model.modules():
    if isinstance(m, KOLMGLoRALinear):
        groups += m.get_lora_plus_param_groups(base_lr=1e-4)
optimizer = torch.optim.AdamW(groups)

Config reference

KOLMGLoRAConfig(
    rank             = 16,       # KAN bottleneck width
    alpha            = 32.0,     # scaling = alpha / rank  (or / √rank with rs_lora)
    dropout          = 0.05,     # dropout on adapter input
    target_modules   = None,     # None → ["q_proj","k_proj","v_proj","out"]
    train_ffn        = False,    # also wrap gate/up/down FFN projections
    rs_lora          = False,    # rank-stabilised scaling
    use_dora         = False,    # DoRA magnitude decomposition
    lora_plus_ratio  = 1.0,      # LoRA+ LR multiplier for φ_out
    grid_size        = 4,        # KAN grid intervals (3=fast, 8=expressive)
    spline_order     = 1,        # 1=piecewise-linear (fast), 3=cubic (smooth)
    grid_range       = (-1., 1.),# KAN input domain
    kan_scale_noise  = 0.1,      # spline weight init noise
    kan_scale_base   = 1.0,      # SiLU base path init scale
)

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kolmg_lora-1.0.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file kolmg_lora-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: kolmg_lora-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for kolmg_lora-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 13c78578029ee5be558a11e6178eb7411ac5cb3ac76e4ce028c588e01c299b6b
MD5 7dfea0f09163467e4ea3cd9d46d96fdd
BLAKE2b-256 0a882b28638385be773884b4e9dec594d840e347db33a8a0aa8f5215d7078ed1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page