Keep your local LLM fresh without forgetting what it already knows — replay buffers + forgetting metrics + auto-rollback for continual fine-tuning
Project description
llm-refresh-wheel
Keep your local LLM fresh without forgetting what it already knows.
Fine-tune your LLM on new data continuously — without catastrophic forgetting. llm-refresh-wheel wraps HuggingFace PEFT/TRL with a smart replay buffer system, concrete forgetting metrics, and auto-rollback when forgetting is detected.
The Problem
When you fine-tune an LLM on new data, it forgets old knowledge. This is called catastrophic forgetting. Current tools (PEFT, TRL) give you training — but no safety net.
llm-refresh-wheel gives you:
- Replay buffers — mix old examples back in during each training cycle
- Forgetting metrics — measure exactly how much knowledge was lost (BWT, FWT, KRS)
- Auto-rollback — automatically revert training if forgetting exceeds your threshold
New Data ──┐
├──► Build Training Set ──► Train (LoRA) ──► Eval on Anchor
Replay ────┘ │
Buffer ◄────────────────────────────────────────── Add New ◄──┤
│
BWT < threshold?
└──► Rollback ✓
Quick Start
# Install (no ML deps — CLI only)
pip install llm-refresh-wheel
# Install with training support
pip install "llm-refresh-wheel[train]"
# Initialize config
refresh-wheel init
# Add your anchor evaluation set (JSONL: one {"text": "..."} per line)
refresh-wheel anchor anchor.jsonl
# Add new training data
refresh-wheel add new_data.jsonl
# Run a refresh cycle
refresh-wheel refresh --model microsoft/phi-2
# Check your model's health
refresh-wheel status
Forgetting Metrics Explained
| Metric | What it measures | Good value |
|---|---|---|
| BWT (Backward Transfer) | Did perplexity on old data increase after training? | ≥ 0.0 (no forgetting) |
| FWT (Forward Transfer) | Did prior training help on new data? | > 0.0 (positive transfer) |
| KRS (Knowledge Retention Score) | Overall knowledge retention [0–100] | ≥ 80 |
Math
BWT_t = PPL_anchor_before - PPL_anchor_after
(negative = perplexity went up = forgetting happened)
FWT_t = PPL_pre_(t-1) - PPL_pre_t
(positive = prior cycles helped on new tasks)
KRS = 100 × exp(−λ × Σ|BWT_t| for BWT_t < 0)
(100 = perfect retention, decays exponentially with cumulative forgetting)
Buffer Strategies
| Strategy | How it works | Best for |
|---|---|---|
| reservoir (default) | Vitter's Algorithm R — uniform random sample over all seen data | General use, unknown data distribution |
| prioritized | Keeps highest-loss (hardest) examples; evicts easy ones | When hard examples matter most |
| diverse | Hash-based bucketing into 64 slots — prevents topic dominance | When data has many distinct topics |
CLI Reference
| Command | Description |
|---|---|
refresh-wheel init |
Write default config.toml |
refresh-wheel add <file.jsonl> |
Add data to replay buffer |
refresh-wheel anchor <file.jsonl> |
Set anchor evaluation dataset |
refresh-wheel refresh [--model NAME] [--epochs N] [--dry-run] |
Run one refresh cycle |
refresh-wheel status |
Buffer stats + KRS + last refresh time |
refresh-wheel metrics |
Full forgetting history as a table |
refresh-wheel eval |
Compute perplexity on anchor set |
refresh-wheel schedule --every 24 |
Start daemon (refreshes every N hours) |
refresh-wheel config show |
Pretty-print current config |
refresh-wheel config set KEY VALUE |
Dot-notation config update |
Config Examples
# Change buffer strategy
refresh-wheel config set buffer.strategy prioritized
# Adjust forgetting threshold
refresh-wheel config set eval.forgetting_threshold -0.2
# Disable auto-rollback
refresh-wheel config set eval.auto_rollback false
# Use a different model
refresh-wheel config set model.name meta-llama/Llama-3.2-1B
Python API
from llm_refresh import RefreshWheel, BufferStrategy
# Initialize
rw = RefreshWheel(
model_name="microsoft/phi-2",
buffer_strategy=BufferStrategy.RESERVOIR,
state_path="~/.local/share/llm_refresh/myproject",
)
# Set anchor evaluation set (never changes — measures forgetting)
with open("anchor.jsonl") as f:
anchor = [json.loads(line) for line in f]
rw.set_anchor(anchor)
# Add new training data
rw.add_data([
{"text": "New fact: The Eiffel Tower was completed in 1889."},
{"text": "New fact: Python was created by Guido van Rossum."},
])
# Run a refresh cycle
result = rw.refresh(epochs=1)
print(f"BWT: {result.bwt:.4f}") # negative = forgetting
print(f"KRS: {result.krs:.1f}") # 0-100
print(f"Rolled back: {result.rolled_back}")
# Check overall health
status = rw.status()
print(status["metrics"])
# Save state (buffer + history, not model weights)
rw.save("~/.local/share/llm_refresh/myproject")
# Restore later
rw2 = RefreshWheel(model_name="microsoft/phi-2")
rw2.load("~/.local/share/llm_refresh/myproject")
Using Just the Buffer (No GPU Required)
from llm_refresh import create_buffer, BufferStrategy
buf = create_buffer(BufferStrategy.DIVERSE, max_size=10_000, n_buckets=64)
buf.add([{"text": "example one"}, {"text": "example two"}])
samples = buf.sample(100)
print(buf.stats())
buf.save("buffer.json")
Using Just the Metrics Tracker
from llm_refresh import ForgettingTracker
from llm_refresh.models import EvalResult, RefreshResult
tracker = ForgettingTracker(krs_lambda=0.01)
# Record a refresh cycle's results
tracker.record_result(RefreshResult.new(
examples_trained=500,
replay_examples=150,
new_examples=350,
pre_eval=EvalResult.now(perplexity=12.3, loss=2.5, dataset_size=200),
post_eval=EvalResult.now(perplexity=11.8, loss=2.4, dataset_size=200),
bwt=0.5, # perplexity improved
fwt=0.2,
krs=100.0,
rolled_back=False,
rollback_reason="",
))
print(tracker.summary())
# {'cycles': 1, 'bwt': 0.5, 'fwt': 0.0, 'krs': 100.0, ...}
Installation Options
# Core CLI only (no ML deps)
pip install llm-refresh-wheel
# With PyTorch
pip install "llm-refresh-wheel[torch]"
# With Transformers
pip install "llm-refresh-wheel[transformers]"
# With PEFT (LoRA)
pip install "llm-refresh-wheel[peft]"
# Full training stack (torch + transformers + peft + trl + datasets)
pip install "llm-refresh-wheel[train]"
# With scheduler daemon support
pip install "llm-refresh-wheel[schedule]"
Configuration
Config file lives at ~/.config/llm_refresh/config.toml. Run refresh-wheel init to create it.
[model]
name = "microsoft/phi-2"
rank = 16
lora_alpha = 32
lora_dropout = 0.05
target_modules = ["q_proj", "v_proj"]
[buffer]
strategy = "reservoir"
max_size = 10000
min_replay_ratio = 0.3
n_buckets = 64
[training]
batch_size = 4
gradient_accumulation_steps = 4
learning_rate = 0.0002
warmup_ratio = 0.03
max_seq_length = 512
[eval]
anchor_size = 200
forgetting_threshold = -0.1
auto_rollback = true
batch_size = 8
krs_lambda = 0.01
[schedule]
interval_hours = 24.0
Override any setting with environment variables using double underscore notation:
export LLM_REFRESH__MODEL__NAME="meta-llama/Llama-3.2-1B"
export LLM_REFRESH__EVAL__AUTO_ROLLBACK=false
License
MIT
Support
If this tool saves you from a catastrophic forgetting disaster, consider buying me a coffee:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_refresh_wheel-0.1.0.tar.gz.
File metadata
- Download URL: llm_refresh_wheel-0.1.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3013e7ae17ea27e1a79bd36ed2ea368a06862000e6de323593d62a8f0f3f0a7b
|
|
| MD5 |
fb158118ee43ed43fe7bf2af5b0dd1bc
|
|
| BLAKE2b-256 |
5e3810d705eb41a8a0f860e36ae36a378c0577968229e3b96fdb3281d45e4f9f
|
File details
Details for the file llm_refresh_wheel-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_refresh_wheel-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a61ec0ce2c46285378ffaf1dbae69e2eee7111306ffc5bb4f5bcd09f69c41998
|
|
| MD5 |
5a217bf4b7776791d66ec16755a2783f
|
|
| BLAKE2b-256 |
711c4d40b5d473e20d188014140eda532ba5493c220052479467a49c6f7a5eb2
|