Fine-tune LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU with auto-publish to Hugging Face
Project description
LFM Trainer
Fine-tune Liquid LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU โ with automatic checkpoint publishing to Hugging Face on errors, and post-training GGUF + MLX quantization.
Features
- ๐ Multi-GPU training via HuggingFace Accelerate / DDP (Kaggle P100 / 2รT4)
- ๐ง LoRA / PEFT for memory-efficient fine-tuning
- ๐ฆ Structured dataset loading โ CSV, Parquet, JSONL, HuggingFace Hub IDs, or direct
pd.DataFrameobjects - ๐ Auto-format detection โ Alpaca, prompt/response, conversational/chat (DataClaw), single text column
- ๏ฟฝ๏ธ Tool calling support โ LFM 2.5 native
<|tool_call_start|>/<|tool_call_end|>tokens; handles OpenAI and DataClaw tool call formats - ๏ฟฝ๐ก๏ธ Error-resilient training โ auto-publishes versioned checkpoints on OOM, SIGTERM (Kaggle timeout), KeyboardInterrupt, or any exception
- ๐ Flexible HF auth โ CLI arg,
HF_TOKENenv var, or Kaggle Secrets - ๐ GGUF export โ Q4_K_M, Q6_K, Q8_0 via llama.cpp
- ๐ MLX export โ 4-bit, 6-bit, 8-bit via mlx-lm
- ๐ท๏ธ Shared versioning โ base model + all quantized variants tagged with the same version
- ๐ Full fine-tuning โ train all parameters (no LoRA) for maximum quality
- ๐ Auto-benchmarking โ HumanEval, MBPP, MultiPL-E, BigCodeBench, EvalPlus, Tool Calling, GSM8K, ARC
- ๐ง Reasoning (
<think>tags) โ train models that think before acting, with<think>...</think>traces - ๐งน Data quality filters โ auto-remove duplicates, empty rows, and length outliers
- ๐ Eval split โ hold out a % for validation loss tracking during training
- ๐ Auto model card โ generates a HuggingFace README.md with config, benchmarks, and hardware
- ๐ W&B / TensorBoard โ optional training metric logging
- ๐ฏ DPO / PPO / GRPO alignment โ preference tuning after SFT via DPO, classic RLHF (PPO), or DeepSeek-style GRPO
- ๐ Continued Pre-Training (CPT) โ train on raw text (books, PDFs, code) to inject domain knowledge
- ๐ LoRA adapter merging โ combine multiple adapters into a single model with weighted blending
- ๐ Auto HP search โ try multiple learning rates and pick the best based on eval loss
- โก DeepSpeed ZeRO โ ZeRO-2 and ZeRO-3 for multi-GPU training (optimizer + gradient + weight sharding)
- ๐ฏ Model Distillation โ compress a large teacher into a smaller student via KL-divergence
- ๐ Structured Output โ train models to generate valid JSON conforming to schemas
Installation
pip install lfm-trainer
Quick Start (Kaggle Notebook)
# Cell 1: Install
!pip install lfm-trainer
# Cell 2: Train on a coding dataset with GGUF export
!lfm-train \
--dataset peteromallet/dataclaw-peteromallet \
--hub-repo your-username/lfm-code \
--export-gguf
# Train on multiple datasets
!lfm-train \
--dataset code_data.csv \
--dataset more_code.parquet \
--dataset sahil2801/CodeAlpaca-20k \
--hub-repo your-username/lfm-code \
--epochs 3 \
--batch-size 2 \
--export-gguf
The HF token is automatically picked up from Kaggle Secrets (key: HF_TOKEN).
Python API
import pandas as pd
from lfm_trainer import run_training, load_datasets
from lfm_trainer.config import TrainingConfig
# Load multiple sources including DataFrames
df = pd.read_csv("my_code_data.csv")
dataset = load_datasets([
df, # Direct DataFrame
"code_data.parquet", # Local file
"peteromallet/dataclaw-peteromallet", # HuggingFace Hub (conversational)
])
# Or use the full pipeline
cfg = TrainingConfig(
model_name="liquid/LFM2.5-1.2B-Base",
dataset_paths=["peteromallet/dataclaw-peteromallet"],
hub_repo_id="your-username/lfm-code",
quality_filter=True,
eval_split=0.1,
run_benchmark=True,
benchmark_before_after=True,
export_gguf=True,
)
run_training(cfg)
CLI Reference
lfm-train --help
| Flag | Default | Description |
|---|---|---|
--dataset |
(required) | Dataset path or Hub ID (repeatable) |
--model |
liquid/LFM2.5-1.2B-Base |
Model to fine-tune |
--resume-from |
โ | Path or Hub ID of a prior adapter for continual training |
--tool-calling-only |
off | Keep only samples with tool calls |
--quality-filter |
off | Remove empty rows, dupes, length outliers |
--eval-split |
0.0 | Hold out a fraction for eval (e.g. 0.1 = 10%) |
--hf-token |
auto-detect | HuggingFace token |
--hub-repo |
auto | Hub repo to push to |
--no-push |
off | Save locally only, skip Hub push |
--epochs |
3 | Training epochs |
--batch-size |
2 | Per-device batch size |
--lr |
2e-4 | Learning rate |
--max-seq-length |
2048 | Max sequence length |
--lora-r |
16 | LoRA rank |
--lora-alpha |
32 | LoRA alpha |
--bf16 |
off | Use bfloat16 |
--full-finetune |
off | Train all params (no LoRA), needs more VRAM |
--report-to |
none |
none, wandb, or tensorboard |
--benchmark |
off | Run HumanEval + MBPP after training |
--benchmark-compare |
off | Also benchmark base model for delta |
--benchmark-max |
all | Cap problems for quick testing |
--no-model-card |
off | Skip auto model card generation |
--export-gguf |
off | Export GGUF (Q4_K_M, Q6_K, Q8_0) |
--export-mlx |
off | Export MLX (4/6/8-bit) |
--export-dir |
./lfm-exports |
Export scratch directory |
--alignment-method |
dpo |
Alignment method: dpo, ppo, or grpo |
--alignment-dataset |
none | HF dataset for alignment (DPO: chosen/rejected; PPO/GRPO: prompts) |
--dpo-beta |
0.1 | DPO ฮฒ โ higher = more conservative |
--reward-model |
none | HF reward model for PPO |
--grpo-generations |
4 | Completions per prompt for GRPO |
--cpt-sources |
none | Raw text sources for CPT (files, dirs, HF datasets) |
--cpt-chunk-size |
2048 | Characters per chunk for CPT |
--enable-reasoning |
off | Enable <think> reasoning tags in training data |
--reasoning-dataset |
none | HF dataset for reasoning (e.g., LLM360/TxT360-3efforts) |
--reasoning-max-samples |
100000 | Max samples from reasoning dataset |
--auto-hp-search |
off | Run auto hyperparameter search before training |
--hp-trial-steps |
50 | Steps per HP search trial |
--deepspeed |
none | DeepSpeed config: zero2, zero3, or path to JSON |
--distill-teacher |
none | HF model ID of teacher for knowledge distillation |
--distill-temperature |
2.0 | Distillation softmax temperature |
--distill-alpha |
0.5 | Blend factor: 0=CE only, 1=KL only |
--structured-output |
off | Mix in JSON schema training data for structured output |
--merge-adapters |
none | Merge multiple LoRA adapters (skip training) |
--merge-output |
./lfm-merged |
Output dir for merged model |
--simulate-error |
off | Test auto-publish mechanism |
Supported Dataset Formats
The data loader auto-detects column layouts:
| Format | Detected By | Example |
|---|---|---|
| Alpaca | instruction + output columns |
iamtarun/python_code_instructions_18k_alpaca |
| Prompt/Response | prompt + response columns |
Generic Q&A datasets |
| Conversational | messages column (with tool calls) |
peteromallet/dataclaw-peteromallet |
| Text | Single text column |
jdaddyalbs/playwright-mcp-toolcalling |
| DataFrame | Direct pd.DataFrame objects |
In-memory data |
Tool-Calling Training
Train exclusively on tool-calling examples using --tool-calling-only:
lfm-train \
--dataset jdaddyalbs/playwright-mcp-toolcalling \
--tool-calling-only \
--hub-repo your-username/lfm-tools \
--max-seq-length 4096
The filter keeps only samples containing tool call patterns (<|tool_call_start|>, tool_calls, function_call, etc.). Works with any dataset โ pre-formatted or auto-formatted.
Continual Training
Train iteratively across multiple datasets, saving locally between rounds:
# Round 1: Coding skills โ save locally
lfm-train --dataset sahil2801/CodeAlpaca-20k --no-push
# Round 2: Add tool-calling on top โ save locally
lfm-train \
--resume-from ./lfm-checkpoints/final-adapter \
--dataset jdaddyalbs/playwright-mcp-toolcalling \
--tool-calling-only \
--output-dir ./lfm-checkpoints-r2 \
--no-push
# Round 3: Final round โ push to Hub + export
lfm-train \
--resume-from ./lfm-checkpoints-r2/final-adapter \
--dataset peteromallet/dataclaw-peteromallet \
--hub-repo your-username/lfm-final \
--export-gguf
Each --resume-from loads the prior adapter and continues learning from where it left off.
How Auto-Publish Works
Training is wrapped in an error handler inspired by Unsloth:
- SIGTERM (Kaggle timeout) โ saves + pushes immediately, then exits
- CUDA OOM โ clears cache, saves + pushes
- KeyboardInterrupt โ saves + pushes
- Any Exception โ saves + pushes, then re-raises
Each checkpoint is tagged with a UTC timestamp (e.g., v20260302-153000) so versions never collide.
Post-Training Export
When --export-gguf or --export-mlx is enabled:
- LoRA adapters are merged into the base model
- GGUF: Converted via llama.cpp โ Q4_K_M, Q6_K, Q8_0 โ pushed to
{repo}-GGUF - MLX: Converted via mlx-lm โ 4-bit, 6-bit, 8-bit โ pushed to
{repo}-MLX-{N}bit - All repos (base + quants) share the same version tag
Note: MLX export requires Apple Silicon. Use
--export-ggufon Kaggle (Linux), and--export-mlxlocally on Mac.
Examples
See the examples/ directory for ready-to-run scripts:
๐ getting_started/ โ First steps
| Example | Description |
|---|---|
basic_training.py |
Simple Alpaca coding fine-tune |
kaggle_notebook.py |
Copy-paste Kaggle cells |
wandb_training.py |
W&B logging with auto API key detection |
full_finetune.py |
Full parameter training (no LoRA) |
continual_training.py |
Multi-round training with local saves |
multi_dataset_training.py |
Combining Hub + local + DataFrame sources |
tool_calling_training.py |
Tool-calling-only with playwright MCP |
auto_hp_search.py |
Auto learning rate search before training |
๐ training_modes/ โ Advanced training techniques
| Example | Description |
|---|---|
deepspeed_training.py |
DeepSpeed ZeRO-2 and ZeRO-3 multi-GPU |
distillation.py |
Distill 7B teacher โ 1.2B student |
dpo_alignment.py |
DPO / PPO / GRPO alignment after SFT |
merge_adapters.py |
Merge multiple LoRA adapters (with weights) |
cpt_raw_text.py |
Train on books, PDFs, or raw text (CPT) |
structured_output.py |
JSON schema training + validation |
๐ domain_specialists/ โ Domain-specific fine-tuning
| Example | Description |
|---|---|
terminal_agent.py |
Terminal agent with Nemotron-Terminal-Corpus |
medical_assistant.py |
๐ฅ Healthcare: medical Q&A, patient-doctor |
sql_specialist.py |
๐๏ธ Text-to-SQL & data analysis |
legal_assistant.py |
โ๏ธ Legal reasoning, contract analysis |
finance_assistant.py |
๐ Financial analysis, sentiment |
cybersecurity.py |
๐ Security analysis, CTF, pentesting |
๐ recipes/ โ End-to-end model recipes
| Example | Description |
|---|---|
recipe_tool_calling.py |
๐ณ Tool calling specialist |
recipe_reasoning_tools.py |
๐ณ Reasoning + tool calling (TxT360) |
recipe_from_scratch.py |
๐ณ Domain expert from books/blogs |
chatbot_assistant.py |
๐ฌ Multi-turn chat + DPO alignment |
api_builder.py |
๐ง API dev: REST, OpenAPI, function calling |
math_reasoning.py |
๐งฎ Math reasoning with <think> traces |
multilang_coding.py |
๐ป Multi-language coder, code reviewer |
๐ advanced/ โ Benchmarking & export
| Example | Description |
|---|---|
benchmark_training.py |
Train + HumanEval/MBPP benchmark + auto-upload |
full_benchmark_suite.py |
All 9 benchmarks with before/after comparison |
benchmark_only.py |
Evaluate any model without training |
export_only.py |
Standalone GGUF/MLX quantization |
๐ Documentation
New to LLM fine-tuning? Start here โ no prerequisites beyond basic Python:
| # | Guide | What you'll learn |
|---|---|---|
| 1 | What is an LLM? | Tokens, embeddings, attention, transformers |
| 2 | How Training Works | Loss functions, backprop, gradient descent |
| 3 | Fine-Tuning vs Scratch | Transfer learning, catastrophic forgetting |
| 4 | LoRA Explained | Low-rank adapters, the math, pure Python impl |
| 5 | Full Fine-Tuning | Gradient checkpointing, memory management |
| 6 | Data Preparation | Formats, tokenization, quality filters |
| 7 | Evaluation & Benchmarks | HumanEval, MBPP, pass@k metric |
| 8 | Quantization & Export | GGUF, MLX, INT4/INT8 |
| 9 | Architecture Deep-Dive | How lfm-trainer is built |
| 10 | DPO, PPO, GRPO & Alignment | DPO/PPO/GRPO math, datasets, pipeline |
| 11 | Continued Pre-Training (CPT) | Train on books, raw text, domain knowledge |
| 12 | Reasoning & Thinking | <think> tags, TxT360, model recipes, benchmarks |
| 13 | DeepSpeed & Distillation | ZeRO-2/3, knowledge distillation, teacherโstudent |
| 14 | Structured Output | JSON-mode training, schema validation, benchmarks |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lfm_trainer-0.14.0.tar.gz.
File metadata
- Download URL: lfm_trainer-0.14.0.tar.gz
- Upload date:
- Size: 271.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7cc1beb76616c3092ee0df8664b0c83148045ef38b9c8c8110e4c59cd23cbf1
|
|
| MD5 |
8d6fb35c24c9de0775a6cc6367323460
|
|
| BLAKE2b-256 |
669b5034170d17953b8c6a46c3946f31c914b7048640a0c063b1a7c01c06382d
|
File details
Details for the file lfm_trainer-0.14.0-py3-none-any.whl.
File metadata
- Download URL: lfm_trainer-0.14.0-py3-none-any.whl
- Upload date:
- Size: 61.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4c2b228ac115c02b63e7c0cab33f3c8c0d1bad0f8186867843e8f60f838b2aa
|
|
| MD5 |
4afa679a5ef0b16f29b62a0c1dc118ba
|
|
| BLAKE2b-256 |
7910331c17395532a3d573eea3ec8168ac4dbb305a3f177d751d5503256a6708
|