Skip to main content

Keep your local LLM fresh without forgetting what it already knows — replay buffers + forgetting metrics + auto-rollback for continual fine-tuning

Project description

llm-refresh-wheel

Keep your local LLM fresh without forgetting what it already knows.

Fine-tune your LLM on new data continuously — without catastrophic forgetting. llm-refresh-wheel wraps HuggingFace PEFT/TRL with a smart replay buffer system, concrete forgetting metrics, and auto-rollback when forgetting is detected.


The Problem

When you fine-tune an LLM on new data, it forgets old knowledge. This is called catastrophic forgetting. Current tools (PEFT, TRL) give you training — but no safety net.

llm-refresh-wheel gives you:

  • Replay buffers — mix old examples back in during each training cycle
  • Forgetting metrics — measure exactly how much knowledge was lost (BWT, FWT, KRS)
  • Auto-rollback — automatically revert training if forgetting exceeds your threshold
New Data ──┐
           ├──► Build Training Set ──► Train (LoRA) ──► Eval on Anchor
Replay ────┘                                                  │
Buffer ◄────────────────────────────────────────── Add New ◄──┤
                                                              │
                                                    BWT < threshold?
                                                    └──► Rollback ✓

Quick Start

# Install (no ML deps — CLI only)
pip install llm-refresh-wheel

# Install with training support
pip install "llm-refresh-wheel[train]"

# Initialize config
refresh-wheel init

# Add your anchor evaluation set (JSONL: one {"text": "..."} per line)
refresh-wheel anchor anchor.jsonl

# Add new training data
refresh-wheel add new_data.jsonl

# Run a refresh cycle
refresh-wheel refresh --model microsoft/phi-2

# Check your model's health
refresh-wheel status

Forgetting Metrics Explained

Metric What it measures Good value
BWT (Backward Transfer) Did perplexity on old data increase after training? ≥ 0.0 (no forgetting)
FWT (Forward Transfer) Did prior training help on new data? > 0.0 (positive transfer)
KRS (Knowledge Retention Score) Overall knowledge retention [0–100] ≥ 80

Math

BWT_t  = PPL_anchor_before - PPL_anchor_after
         (negative = perplexity went up = forgetting happened)

FWT_t  = PPL_pre_(t-1) - PPL_pre_t
         (positive = prior cycles helped on new tasks)

KRS    = 100 × exp(−λ × Σ|BWT_t| for BWT_t < 0)
         (100 = perfect retention, decays exponentially with cumulative forgetting)

Buffer Strategies

Strategy How it works Best for
reservoir (default) Vitter's Algorithm R — uniform random sample over all seen data General use, unknown data distribution
prioritized Keeps highest-loss (hardest) examples; evicts easy ones When hard examples matter most
diverse Hash-based bucketing into 64 slots — prevents topic dominance When data has many distinct topics

CLI Reference

Command Description
refresh-wheel init Write default config.toml
refresh-wheel add <file.jsonl> Add data to replay buffer
refresh-wheel anchor <file.jsonl> Set anchor evaluation dataset
refresh-wheel refresh [--model NAME] [--epochs N] [--dry-run] Run one refresh cycle
refresh-wheel status Buffer stats + KRS + last refresh time
refresh-wheel metrics Full forgetting history as a table
refresh-wheel eval Compute perplexity on anchor set
refresh-wheel schedule --every 24 Start daemon (refreshes every N hours)
refresh-wheel config show Pretty-print current config
refresh-wheel config set KEY VALUE Dot-notation config update

Config Examples

# Change buffer strategy
refresh-wheel config set buffer.strategy prioritized

# Adjust forgetting threshold
refresh-wheel config set eval.forgetting_threshold -0.2

# Disable auto-rollback
refresh-wheel config set eval.auto_rollback false

# Use a different model
refresh-wheel config set model.name meta-llama/Llama-3.2-1B

Python API

from llm_refresh import RefreshWheel, BufferStrategy

# Initialize
rw = RefreshWheel(
    model_name="microsoft/phi-2",
    buffer_strategy=BufferStrategy.RESERVOIR,
    state_path="~/.local/share/llm_refresh/myproject",
)

# Set anchor evaluation set (never changes — measures forgetting)
with open("anchor.jsonl") as f:
    anchor = [json.loads(line) for line in f]
rw.set_anchor(anchor)

# Add new training data
rw.add_data([
    {"text": "New fact: The Eiffel Tower was completed in 1889."},
    {"text": "New fact: Python was created by Guido van Rossum."},
])

# Run a refresh cycle
result = rw.refresh(epochs=1)
print(f"BWT: {result.bwt:.4f}")   # negative = forgetting
print(f"KRS: {result.krs:.1f}")   # 0-100
print(f"Rolled back: {result.rolled_back}")

# Check overall health
status = rw.status()
print(status["metrics"])

# Save state (buffer + history, not model weights)
rw.save("~/.local/share/llm_refresh/myproject")

# Restore later
rw2 = RefreshWheel(model_name="microsoft/phi-2")
rw2.load("~/.local/share/llm_refresh/myproject")

Using Just the Buffer (No GPU Required)

from llm_refresh import create_buffer, BufferStrategy

buf = create_buffer(BufferStrategy.DIVERSE, max_size=10_000, n_buckets=64)
buf.add([{"text": "example one"}, {"text": "example two"}])
samples = buf.sample(100)
print(buf.stats())
buf.save("buffer.json")

Using Just the Metrics Tracker

from llm_refresh import ForgettingTracker
from llm_refresh.models import EvalResult, RefreshResult

tracker = ForgettingTracker(krs_lambda=0.01)

# Record a refresh cycle's results
tracker.record_result(RefreshResult.new(
    examples_trained=500,
    replay_examples=150,
    new_examples=350,
    pre_eval=EvalResult.now(perplexity=12.3, loss=2.5, dataset_size=200),
    post_eval=EvalResult.now(perplexity=11.8, loss=2.4, dataset_size=200),
    bwt=0.5,    # perplexity improved
    fwt=0.2,
    krs=100.0,
    rolled_back=False,
    rollback_reason="",
))

print(tracker.summary())
# {'cycles': 1, 'bwt': 0.5, 'fwt': 0.0, 'krs': 100.0, ...}

Installation Options

# Core CLI only (no ML deps)
pip install llm-refresh-wheel

# With PyTorch
pip install "llm-refresh-wheel[torch]"

# With Transformers
pip install "llm-refresh-wheel[transformers]"

# With PEFT (LoRA)
pip install "llm-refresh-wheel[peft]"

# Full training stack (torch + transformers + peft + trl + datasets)
pip install "llm-refresh-wheel[train]"

# With scheduler daemon support
pip install "llm-refresh-wheel[schedule]"

Configuration

Config file lives at ~/.config/llm_refresh/config.toml. Run refresh-wheel init to create it.

[model]
name = "microsoft/phi-2"
rank = 16
lora_alpha = 32
lora_dropout = 0.05
target_modules = ["q_proj", "v_proj"]

[buffer]
strategy = "reservoir"
max_size = 10000
min_replay_ratio = 0.3
n_buckets = 64

[training]
batch_size = 4
gradient_accumulation_steps = 4
learning_rate = 0.0002
warmup_ratio = 0.03
max_seq_length = 512

[eval]
anchor_size = 200
forgetting_threshold = -0.1
auto_rollback = true
batch_size = 8
krs_lambda = 0.01

[schedule]
interval_hours = 24.0

Override any setting with environment variables using double underscore notation:

export LLM_REFRESH__MODEL__NAME="meta-llama/Llama-3.2-1B"
export LLM_REFRESH__EVAL__AUTO_ROLLBACK=false

License

MIT


Support

If this tool saves you from a catastrophic forgetting disaster, consider buying me a coffee:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_refresh_wheel-0.1.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_refresh_wheel-0.1.0-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file llm_refresh_wheel-0.1.0.tar.gz.

File metadata

  • Download URL: llm_refresh_wheel-0.1.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_refresh_wheel-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3013e7ae17ea27e1a79bd36ed2ea368a06862000e6de323593d62a8f0f3f0a7b
MD5 fb158118ee43ed43fe7bf2af5b0dd1bc
BLAKE2b-256 5e3810d705eb41a8a0f860e36ae36a378c0577968229e3b96fdb3281d45e4f9f

See more details on using hashes here.

File details

Details for the file llm_refresh_wheel-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_refresh_wheel-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a61ec0ce2c46285378ffaf1dbae69e2eee7111306ffc5bb4f5bcd09f69c41998
MD5 5a217bf4b7776791d66ec16755a2783f
BLAKE2b-256 711c4d40b5d473e20d188014140eda532ba5493c220052479467a49c6f7a5eb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page