Gradient-Efficient Knowledge Optimization - Smart training for any LLM
Project description
GEKO: Gradient-Efficient Knowledge Optimization
A plug-and-play training framework that makes LLM training more efficient.
Like LoRA revolutionized fine-tuning, GEKO revolutionizes training.
Key Insight
Traditional training treats all samples equally:
$$\mathcal{L}{standard} = \frac{1}{N} \sum{i=1}^{N} \ell(x_i, y_i)$$
GEKO weights samples by their learning value:
$$\mathcal{L}{GEKO} = \frac{1}{N} \sum{i=1}^{N} w_i \cdot \ell(x_i, y_i) \quad \text{where} \quad w_i = f(bucket_i)$$
Installation
pip install geko
Quick Start
from geko import GEKOTrainer, GEKOConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
trainer = GEKOTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=your_dataset,
)
trainer.train()
print(trainer.get_efficiency_report())
The GEKO Algorithm
Sample Partitioning
flowchart TD
A[Sample] --> B{Correct?}
B -->|Yes| C{Confident & High Quality?}
B -->|No| D{High Confidence?}
C -->|Yes| E[🔵 FREEZE<br/>w = 0<br/>Never train]
C -->|No| F[🟢 LIGHT<br/>w = 0<br/>Low priority]
D -->|Yes| G[🔴 HARD<br/>w = 3<br/>Highest priority]
D -->|No| H[🟠 FOCUS<br/>w = 1<br/>Medium priority]
style E fill:#3498db,color:#fff
style F fill:#2ecc71,color:#fff
style G fill:#e74c3c,color:#fff
style H fill:#f39c12,color:#fff
Bucket Definitions
| Bucket | Condition | Weight | Description |
|---|---|---|---|
| 🔵 FREEZE | $correct \land c > 0.85 \land q > 0.80$ | $w = 0$ | Mastered |
| 🟢 LIGHT | $correct \land (c \leq 0.85 \lor q \leq 0.80)$ | $w = 0$ | Uncertain |
| 🟠 FOCUS | $\neg correct \land c \leq 0.60$ | $w = 1$ | Wrong |
| 🔴 HARD | $\neg correct \land c > 0.60$ | $w = 3$ | Confident-wrong |
Mountain Curriculum
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#3498db'}}}%%
xychart-beta
title "Mountain Curriculum - Difficulty vs Progress"
x-axis "Training Progress" [0, 0.15, 0.35, 0.65, 0.85, 1.0]
y-axis "Difficulty" 0 --> 1
line [0.2, 0.5, 1.0, 1.0, 0.5, 0.2]
Five Phases
gantt
title Mountain Curriculum Phases
dateFormat X
axisFormat %s
section Difficulty
WARMUP (Easy) :a1, 0, 15
ASCENT (Medium) :a2, 15, 35
PEAK (Hard) :a3, 35, 65
DESCENT (Medium) :a4, 65, 85
CONSOLIDATE (Easy) :a5, 85, 100
| Phase | Progress | HARD | FOCUS | LIGHT | Strategy |
|---|---|---|---|---|---|
| WARMUP | 0-15% | 1 | 2 | 3 | Build foundation |
| ASCENT | 15-35% | 2 | 3 | 1 | Increase difficulty |
| PEAK | 35-65% | 5 | 2 | 0 | Maximum learning |
| DESCENT | 65-85% | 2 | 3 | 1 | Reduce difficulty |
| CONSOLIDATE | 85-100% | 1 | 2 | 3 | Reinforce |
Q-Value Learning
Each sample maintains a Q-value representing "learnability":
$$Q_{t+1}(s) = (1 - \alpha) \cdot Q_t(s) + \alpha \cdot \left(1 - \frac{\ell_t(s)}{\ell_{max}}\right)$$
graph LR
A[Sample Loss ↓] --> B[Q-Value ↑]
B --> C{Q > threshold?}
C -->|Yes| D[Move to FREEZE]
C -->|No| E[Stay trainable]
style D fill:#3498db,color:#fff
style E fill:#f39c12,color:#fff
Efficiency Analysis
Compute Savings Over Time
%%{init: {'theme': 'base'}}%%
pie showData
title "Bucket Distribution (Epoch 10)"
"FREEZE (Saved)" : 80
"LIGHT" : 15
"FOCUS" : 4
"HARD" : 1
Training Progression
| Epoch | FREEZE | LIGHT | FOCUS | HARD | Compute Saved |
|---|---|---|---|---|---|
| 1 | 0% | 20% | 60% | 20% | 0% |
| 2 | 15% | 25% | 45% | 15% | 15% |
| 3 | 35% | 30% | 25% | 10% | 35% |
| 5 | 55% | 25% | 15% | 5% | 55% |
| 10 | 80% | 15% | 4% | 1% | 80% |
Architecture
flowchart TB
subgraph Input
A[Any LLM Model]
B[Training Dataset]
end
subgraph GEKO["GEKO Framework"]
C[GEKOTrainer]
D[Sample Partitioner]
E[Mountain Curriculum]
F[Sample States]
C --> D
C --> E
D --> F
E --> F
end
subgraph Output
G[Efficient Training]
H[Compute Savings]
end
A --> C
B --> C
C --> G
C --> H
style GEKO fill:#f5f5f5,stroke:#333
Theoretical Guarantees
Convergence
Under standard assumptions, GEKO converges:
$$\sum_{t=1}^{\infty} w_t^{(s)} = \infty \quad \forall s \notin \text{FREEZE}$$
Efficiency Bound
$$T_{GEKO} \leq T_{standard} \cdot (1 - \mathbb{E}[F])$$
Where $\mathbb{E}[F]$ = expected freeze fraction.
Results
| Metric | Standard | GEKO | Improvement |
|---|---|---|---|
| Training Time | 100% | 50-70% | 30-50% faster |
| Compute Cost | 100% | 50-70% | 30-50% cheaper |
| Final Loss | $\ell^*$ | $\leq \ell^*$ | Equal or better |
Citation
@software{geko2026,
author = {Syed Abdur Rehman},
title = {GEKO: Gradient-Efficient Knowledge Optimization},
year = {2026},
url = {https://github.com/ra2157218-boop/GEKO}
}
License
Apache 2.0
GEKO - Train smarter, not harder.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gekolib-0.1.0.tar.gz.
File metadata
- Download URL: gekolib-0.1.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d6dd63c8a1ec7a45b194cb227f376e9837529f36a5e9c38306bc814d57a12af
|
|
| MD5 |
3e927029ff20dcf3af4bb412c82a9781
|
|
| BLAKE2b-256 |
42df252b9e4689339f6562ad1f36c691cbaf1ab35adcf3e22d0cc6188707410d
|
File details
Details for the file gekolib-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gekolib-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9861d6b4252d3246694a2bce5e56abe39ca2c39f5357eaa4e9821297886cdbc8
|
|
| MD5 |
a601aeb8984637f519d0a3f69c8052fe
|
|
| BLAKE2b-256 |
7b85fb01550877e26b9e342a843eae2295e58f4b4aa041f893ced13eac7a7b17
|