Skip to main content

Fine-tune LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU with auto-publish to Hugging Face

Project description

LFM Trainer

Fine-tune Liquid LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU โ€” with automatic checkpoint publishing to Hugging Face on errors, and post-training GGUF + MLX quantization.

Features

  • ๐Ÿš€ Multi-GPU training via HuggingFace Accelerate / DDP (Kaggle P100 / 2ร—T4)
  • ๐Ÿง  LoRA / PEFT for memory-efficient fine-tuning
  • ๐Ÿ“ฆ Structured dataset loading โ€” CSV, Parquet, JSONL, HuggingFace Hub IDs, or direct pd.DataFrame objects
  • ๐Ÿ” Auto-format detection โ€” Alpaca, prompt/response, conversational/chat (DataClaw), single text column
  • ๏ฟฝ๏ธ Tool calling support โ€” LFM 2.5 native <|tool_call_start|> / <|tool_call_end|> tokens; handles OpenAI and DataClaw tool call formats
  • ๏ฟฝ๐Ÿ›ก๏ธ Error-resilient training โ€” auto-publishes versioned checkpoints on OOM, SIGTERM (Kaggle timeout), KeyboardInterrupt, or any exception
  • ๐Ÿ”‘ Flexible HF auth โ€” CLI arg, HF_TOKEN env var, or Kaggle Secrets
  • ๐Ÿ“ GGUF export โ€” Q4_K_M, Q6_K, Q8_0 via llama.cpp
  • ๐ŸŽ MLX export โ€” 4-bit, 6-bit, 8-bit via mlx-lm
  • ๐Ÿท๏ธ Shared versioning โ€” base model + all quantized variants tagged with the same version
  • ๐Ÿ”“ Full fine-tuning โ€” train all parameters (no LoRA) for maximum quality
  • ๐Ÿ“Š Auto-benchmarking โ€” HumanEval, MBPP, MultiPL-E, BigCodeBench, EvalPlus, Tool Calling, GSM8K, ARC
  • ๐Ÿง  Reasoning (<think> tags) โ€” train models that think before acting, with <think>...</think> traces
  • ๐Ÿงน Data quality filters โ€” auto-remove duplicates, empty rows, and length outliers
  • ๐Ÿ“ˆ Eval split โ€” hold out a % for validation loss tracking during training
  • ๐Ÿ“ Auto model card โ€” generates a HuggingFace README.md with config, benchmarks, and hardware
  • ๐Ÿ“‰ W&B / TensorBoard โ€” optional training metric logging
  • ๐ŸŽฏ DPO / PPO / GRPO alignment โ€” preference tuning after SFT via DPO, classic RLHF (PPO), or DeepSeek-style GRPO
  • ๐Ÿ“š Continued Pre-Training (CPT) โ€” train on raw text (books, PDFs, code) to inject domain knowledge
  • ๐Ÿ”— LoRA adapter merging โ€” combine multiple adapters into a single model with weighted blending
  • ๐Ÿ” Auto HP search โ€” try multiple learning rates and pick the best based on eval loss
  • โšก DeepSpeed ZeRO โ€” ZeRO-2 and ZeRO-3 for multi-GPU training (optimizer + gradient + weight sharding)
  • ๐ŸŽฏ Model Distillation โ€” compress a large teacher into a smaller student via KL-divergence
  • ๐Ÿ“‹ Structured Output โ€” train models to generate valid JSON conforming to schemas

Installation

pip install lfm-trainer

Quick Start (Kaggle Notebook)

# Cell 1: Install
!pip install lfm-trainer

# Cell 2: Train on a coding dataset with GGUF export
!lfm-train \
    --dataset peteromallet/dataclaw-peteromallet \
    --hub-repo your-username/lfm-code \
    --export-gguf

# Train on multiple datasets
!lfm-train \
    --dataset code_data.csv \
    --dataset more_code.parquet \
    --dataset sahil2801/CodeAlpaca-20k \
    --hub-repo your-username/lfm-code \
    --epochs 3 \
    --batch-size 2 \
    --export-gguf

The HF token is automatically picked up from Kaggle Secrets (key: HF_TOKEN).

Python API

import pandas as pd
from lfm_trainer import run_training, load_datasets
from lfm_trainer.config import TrainingConfig

# Load multiple sources including DataFrames
df = pd.read_csv("my_code_data.csv")
dataset = load_datasets([
    df,                                    # Direct DataFrame
    "code_data.parquet",                   # Local file
    "peteromallet/dataclaw-peteromallet",   # HuggingFace Hub (conversational)
])

# Or use the full pipeline
cfg = TrainingConfig(
    model_name="liquid/LFM2.5-1.2B-Base",
    dataset_paths=["peteromallet/dataclaw-peteromallet"],
    hub_repo_id="your-username/lfm-code",
    quality_filter=True,
    eval_split=0.1,
    run_benchmark=True,
    benchmark_before_after=True,
    export_gguf=True,
)
run_training(cfg)

CLI Reference

lfm-train --help
Flag Default Description
--dataset (required) Dataset path or Hub ID (repeatable)
--model liquid/LFM2.5-1.2B-Base Model to fine-tune
--resume-from โ€” Path or Hub ID of a prior adapter for continual training
--tool-calling-only off Keep only samples with tool calls
--quality-filter off Remove empty rows, dupes, length outliers
--eval-split 0.0 Hold out a fraction for eval (e.g. 0.1 = 10%)
--hf-token auto-detect HuggingFace token
--hub-repo auto Hub repo to push to
--no-push off Save locally only, skip Hub push
--epochs 3 Training epochs
--batch-size 2 Per-device batch size
--lr 2e-4 Learning rate
--max-seq-length 2048 Max sequence length
--lora-r 16 LoRA rank
--lora-alpha 32 LoRA alpha
--bf16 off Use bfloat16
--full-finetune off Train all params (no LoRA), needs more VRAM
--report-to none none, wandb, or tensorboard
--benchmark off Run HumanEval + MBPP after training
--benchmark-compare off Also benchmark base model for delta
--benchmark-max all Cap problems for quick testing
--no-model-card off Skip auto model card generation
--export-gguf off Export GGUF (Q4_K_M, Q6_K, Q8_0)
--export-mlx off Export MLX (4/6/8-bit)
--export-dir ./lfm-exports Export scratch directory
--alignment-method dpo Alignment method: dpo, ppo, or grpo
--alignment-dataset none HF dataset for alignment (DPO: chosen/rejected; PPO/GRPO: prompts)
--dpo-beta 0.1 DPO ฮฒ โ€” higher = more conservative
--reward-model none HF reward model for PPO
--grpo-generations 4 Completions per prompt for GRPO
--cpt-sources none Raw text sources for CPT (files, dirs, HF datasets)
--cpt-chunk-size 2048 Characters per chunk for CPT
--enable-reasoning off Enable <think> reasoning tags in training data
--reasoning-dataset none HF dataset for reasoning (e.g., LLM360/TxT360-3efforts)
--reasoning-max-samples 100000 Max samples from reasoning dataset
--auto-hp-search off Run auto hyperparameter search before training
--hp-trial-steps 50 Steps per HP search trial
--deepspeed none DeepSpeed config: zero2, zero3, or path to JSON
--distill-teacher none HF model ID of teacher for knowledge distillation
--distill-temperature 2.0 Distillation softmax temperature
--distill-alpha 0.5 Blend factor: 0=CE only, 1=KL only
--structured-output off Mix in JSON schema training data for structured output
--merge-adapters none Merge multiple LoRA adapters (skip training)
--merge-output ./lfm-merged Output dir for merged model
--simulate-error off Test auto-publish mechanism

Supported Dataset Formats

The data loader auto-detects column layouts:

Format Detected By Example
Alpaca instruction + output columns iamtarun/python_code_instructions_18k_alpaca
Prompt/Response prompt + response columns Generic Q&A datasets
Conversational messages column (with tool calls) peteromallet/dataclaw-peteromallet
Text Single text column jdaddyalbs/playwright-mcp-toolcalling
DataFrame Direct pd.DataFrame objects In-memory data

Tool-Calling Training

Train exclusively on tool-calling examples using --tool-calling-only:

lfm-train \
    --dataset jdaddyalbs/playwright-mcp-toolcalling \
    --tool-calling-only \
    --hub-repo your-username/lfm-tools \
    --max-seq-length 4096

The filter keeps only samples containing tool call patterns (<|tool_call_start|>, tool_calls, function_call, etc.). Works with any dataset โ€” pre-formatted or auto-formatted.

Continual Training

Train iteratively across multiple datasets, saving locally between rounds:

# Round 1: Coding skills โ†’ save locally
lfm-train --dataset sahil2801/CodeAlpaca-20k --no-push

# Round 2: Add tool-calling on top โ†’ save locally
lfm-train \
    --resume-from ./lfm-checkpoints/final-adapter \
    --dataset jdaddyalbs/playwright-mcp-toolcalling \
    --tool-calling-only \
    --output-dir ./lfm-checkpoints-r2 \
    --no-push

# Round 3: Final round โ†’ push to Hub + export
lfm-train \
    --resume-from ./lfm-checkpoints-r2/final-adapter \
    --dataset peteromallet/dataclaw-peteromallet \
    --hub-repo your-username/lfm-final \
    --export-gguf

Each --resume-from loads the prior adapter and continues learning from where it left off.

How Auto-Publish Works

Training is wrapped in an error handler inspired by Unsloth:

  1. SIGTERM (Kaggle timeout) โ†’ saves + pushes immediately, then exits
  2. CUDA OOM โ†’ clears cache, saves + pushes
  3. KeyboardInterrupt โ†’ saves + pushes
  4. Any Exception โ†’ saves + pushes, then re-raises

Each checkpoint is tagged with a UTC timestamp (e.g., v20260302-153000) so versions never collide.

Post-Training Export

When --export-gguf or --export-mlx is enabled:

  1. LoRA adapters are merged into the base model
  2. GGUF: Converted via llama.cpp โ†’ Q4_K_M, Q6_K, Q8_0 โ†’ pushed to {repo}-GGUF
  3. MLX: Converted via mlx-lm โ†’ 4-bit, 6-bit, 8-bit โ†’ pushed to {repo}-MLX-{N}bit
  4. All repos (base + quants) share the same version tag

Note: MLX export requires Apple Silicon. Use --export-gguf on Kaggle (Linux), and --export-mlx locally on Mac.

Examples

See the examples/ directory for ready-to-run scripts:

Example Description
basic_training.py Simple Alpaca coding fine-tune
tool_calling_training.py Tool-calling-only with playwright MCP
multi_dataset_training.py Combining Hub + local + DataFrame sources
continual_training.py Multi-round training with local saves
benchmark_training.py Train + HumanEval/MBPP benchmark + auto-upload
full_benchmark_suite.py All 5 benchmarks with before/after comparison
benchmark_only.py Evaluate any model without training
full_finetune.py Full parameter training (no LoRA)
wandb_training.py W&B logging with auto API key detection
export_only.py Standalone GGUF/MLX quantization
kaggle_notebook.py Copy-paste Kaggle cells
cpt_raw_text.py Train on books, PDFs, or raw text (CPT)
dpo_alignment.py DPO / PPO / GRPO alignment after SFT
merge_adapters.py Merge multiple LoRA adapters (with weights)
auto_hp_search.py Auto learning rate search before training
recipe_tool_calling.py ๐Ÿณ Recipe: tool calling specialist
recipe_reasoning_tools.py ๐Ÿณ Recipe: reasoning + tool calling (TxT360)
recipe_from_scratch.py ๐Ÿณ Recipe: domain expert from books/blogs
deepspeed_training.py DeepSpeed ZeRO-2 and ZeRO-3 multi-GPU
distillation.py Distill 7B teacher โ†’ 1.2B student
structured_output.py JSON schema training + validation

๐Ÿ“š Documentation

New to LLM fine-tuning? Start here โ€” no prerequisites beyond basic Python:

# Guide What you'll learn
1 What is an LLM? Tokens, embeddings, attention, transformers
2 How Training Works Loss functions, backprop, gradient descent
3 Fine-Tuning vs Scratch Transfer learning, catastrophic forgetting
4 LoRA Explained Low-rank adapters, the math, pure Python impl
5 Full Fine-Tuning Gradient checkpointing, memory management
6 Data Preparation Formats, tokenization, quality filters
7 Evaluation & Benchmarks HumanEval, MBPP, pass@k metric
8 Quantization & Export GGUF, MLX, INT4/INT8
9 Architecture Deep-Dive How lfm-trainer is built
10 DPO, PPO, GRPO & Alignment DPO/PPO/GRPO math, datasets, pipeline
11 Continued Pre-Training (CPT) Train on books, raw text, domain knowledge
12 Reasoning & Thinking <think> tags, TxT360, model recipes, benchmarks
13 DeepSpeed & Distillation ZeRO-2/3, knowledge distillation, teacherโ†’student
14 Structured Output JSON-mode training, schema validation, benchmarks

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lfm_trainer-0.12.0.tar.gz (265.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lfm_trainer-0.12.0-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file lfm_trainer-0.12.0.tar.gz.

File metadata

  • Download URL: lfm_trainer-0.12.0.tar.gz
  • Upload date:
  • Size: 265.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lfm_trainer-0.12.0.tar.gz
Algorithm Hash digest
SHA256 67c37918cbe39bf6a327ae95b6dc34ade5810af3138ab9bf808c4966606bb65f
MD5 cff446d93f703b3233e04c640f881f53
BLAKE2b-256 36b95aadcb9c0c553687b1425ea61a3c606b5939801e244c32169e64ef7595c6

See more details on using hashes here.

File details

Details for the file lfm_trainer-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: lfm_trainer-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lfm_trainer-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bdcfe2b91e234ecccf031a50ed382f69b320e3e804cf45cd569f69848e81c8c9
MD5 6ac42aae86b3aa91ba0705c577d1ee5c
BLAKE2b-256 af152b2d016b243d388b8a82b9c54bfb4cad2e3cb4a7225368f167c84a72a5a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page