Skip to main content

Gradient-Efficient Knowledge Optimization - Smart training for any LLM

Project description

GEKO: Gradient-Efficient Knowledge Optimization

GEKO

Python PyTorch License

A plug-and-play training framework that makes LLM training more efficient.

Like LoRA revolutionized fine-tuning, GEKO revolutionizes training.


Key Insight

GEKO Insight

Traditional training treats all samples equally:

$$\mathcal{L}{standard} = \frac{1}{N} \sum{i=1}^{N} \ell(x_i, y_i)$$

GEKO weights samples by their learning value:

$$\mathcal{L}{GEKO} = \frac{1}{N} \sum{i=1}^{N} w_i \cdot \ell(x_i, y_i) \quad \text{where} \quad w_i = f(bucket_i)$$


Installation

pip install geko

Quick Start

from geko import GEKOTrainer, GEKOConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

trainer = GEKOTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=your_dataset,
)

trainer.train()
print(trainer.get_efficiency_report())

The GEKO Algorithm

Sample Partitioning

Bucket Classification

flowchart TD
    A[Sample] --> B{Correct?}
    B -->|Yes| C{Confident & High Quality?}
    B -->|No| D{High Confidence?}
    C -->|Yes| E[🔵 FREEZE<br/>w = 0<br/>Never train]
    C -->|No| F[🟢 LIGHT<br/>w = 0<br/>Low priority]
    D -->|Yes| G[🔴 HARD<br/>w = 3<br/>Highest priority]
    D -->|No| H[🟠 FOCUS<br/>w = 1<br/>Medium priority]

    style E fill:#3498db,color:#fff
    style F fill:#2ecc71,color:#fff
    style G fill:#e74c3c,color:#fff
    style H fill:#f39c12,color:#fff

Bucket Definitions

Bucket Condition Weight Description
🔵 FREEZE $correct \land c > 0.85 \land q > 0.80$ $w = 0$ Mastered
🟢 LIGHT $correct \land (c \leq 0.85 \lor q \leq 0.80)$ $w = 0$ Uncertain
🟠 FOCUS $\neg correct \land c \leq 0.60$ $w = 1$ Wrong
🔴 HARD $\neg correct \land c > 0.60$ $w = 3$ Confident-wrong

Mountain Curriculum

Mountain Curriculum

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#3498db'}}}%%
xychart-beta
    title "Mountain Curriculum - Difficulty vs Progress"
    x-axis "Training Progress" [0, 0.15, 0.35, 0.65, 0.85, 1.0]
    y-axis "Difficulty" 0 --> 1
    line [0.2, 0.5, 1.0, 1.0, 0.5, 0.2]

Five Phases

gantt
    title Mountain Curriculum Phases
    dateFormat X
    axisFormat %s

    section Difficulty
    WARMUP (Easy)       :a1, 0, 15
    ASCENT (Medium)     :a2, 15, 35
    PEAK (Hard)         :a3, 35, 65
    DESCENT (Medium)    :a4, 65, 85
    CONSOLIDATE (Easy)  :a5, 85, 100
Phase Progress HARD FOCUS LIGHT Strategy
WARMUP 0-15% 1 2 3 Build foundation
ASCENT 15-35% 2 3 1 Increase difficulty
PEAK 35-65% 5 2 0 Maximum learning
DESCENT 65-85% 2 3 1 Reduce difficulty
CONSOLIDATE 85-100% 1 2 3 Reinforce

Q-Value Learning

Q-Value Learning

Each sample maintains a Q-value representing "learnability":

$$Q_{t+1}(s) = (1 - \alpha) \cdot Q_t(s) + \alpha \cdot \left(1 - \frac{\ell_t(s)}{\ell_{max}}\right)$$

graph LR
    A[Sample Loss ↓] --> B[Q-Value ↑]
    B --> C{Q > threshold?}
    C -->|Yes| D[Move to FREEZE]
    C -->|No| E[Stay trainable]

    style D fill:#3498db,color:#fff
    style E fill:#f39c12,color:#fff

Efficiency Analysis

Efficiency Curve

Compute Savings Over Time

%%{init: {'theme': 'base'}}%%
pie showData
    title "Bucket Distribution (Epoch 10)"
    "FREEZE (Saved)" : 80
    "LIGHT" : 15
    "FOCUS" : 4
    "HARD" : 1

Training Progression

Epoch FREEZE LIGHT FOCUS HARD Compute Saved
1 0% 20% 60% 20% 0%
2 15% 25% 45% 15% 15%
3 35% 30% 25% 10% 35%
5 55% 25% 15% 5% 55%
10 80% 15% 4% 1% 80%

Architecture

GEKO Architecture

flowchart TB
    subgraph Input
        A[Any LLM Model]
        B[Training Dataset]
    end

    subgraph GEKO["GEKO Framework"]
        C[GEKOTrainer]
        D[Sample Partitioner]
        E[Mountain Curriculum]
        F[Sample States]

        C --> D
        C --> E
        D --> F
        E --> F
    end

    subgraph Output
        G[Efficient Training]
        H[Compute Savings]
    end

    A --> C
    B --> C
    C --> G
    C --> H

    style GEKO fill:#f5f5f5,stroke:#333

Theoretical Guarantees

Convergence

Under standard assumptions, GEKO converges:

$$\sum_{t=1}^{\infty} w_t^{(s)} = \infty \quad \forall s \notin \text{FREEZE}$$

Efficiency Bound

$$T_{GEKO} \leq T_{standard} \cdot (1 - \mathbb{E}[F])$$

Where $\mathbb{E}[F]$ = expected freeze fraction.


Results

Results

Metric Standard GEKO Improvement
Training Time 100% 50-70% 30-50% faster
Compute Cost 100% 50-70% 30-50% cheaper
Final Loss $\ell^*$ $\leq \ell^*$ Equal or better

Citation

@software{geko2026,
  author = {Syed Abdur Rehman},
  title = {GEKO: Gradient-Efficient Knowledge Optimization},
  year = {2026},
  url = {https://github.com/ra2157218-boop/GEKO}
}

License

Apache 2.0


GEKO - Train smarter, not harder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gekolib-0.1.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gekolib-0.1.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file gekolib-0.1.0.tar.gz.

File metadata

  • Download URL: gekolib-0.1.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for gekolib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2d6dd63c8a1ec7a45b194cb227f376e9837529f36a5e9c38306bc814d57a12af
MD5 3e927029ff20dcf3af4bb412c82a9781
BLAKE2b-256 42df252b9e4689339f6562ad1f36c691cbaf1ab35adcf3e22d0cc6188707410d

See more details on using hashes here.

File details

Details for the file gekolib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gekolib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for gekolib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9861d6b4252d3246694a2bce5e56abe39ca2c39f5357eaa4e9821297886cdbc8
MD5 a601aeb8984637f519d0a3f69c8052fe
BLAKE2b-256 7b85fb01550877e26b9e342a843eae2295e58f4b4aa041f893ced13eac7a7b17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page