Fine-tune LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU with auto-publish to Hugging Face
Project description
LFM Trainer
Fine-tune Liquid LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU — with automatic checkpoint publishing to Hugging Face on errors.
Features
- 🚀 Multi-GPU training via HuggingFace Accelerate / DDP
- 🧠 LoRA / PEFT for memory-efficient fine-tuning on Kaggle T4s
- 📦 Structured dataset loading — CSV, Parquet, JSONL, or HuggingFace Hub IDs — with auto-format detection
- 🛡️ Error-resilient training — auto-publishes versioned checkpoints to Hugging Face on OOM, SIGTERM (Kaggle timeout), KeyboardInterrupt, or any exception
- 🔑 Flexible HF auth — CLI arg,
HF_TOKENenv var, or Kaggle Secrets
Installation
pip install lfm-trainer
Quick Start (Kaggle Notebook)
# In a Kaggle cell:
!pip install lfm-trainer
# Train on a single dataset
!lfm-train --dataset iamtarun/python_code_instructions_18k_alpaca --hub-repo your-username/lfm-code
# Train on multiple datasets
!lfm-train \
--dataset code_data.csv \
--dataset more_code.parquet \
--dataset sahil2801/CodeAlpaca-20k \
--hub-repo your-username/lfm-code \
--epochs 3 \
--batch-size 2
The HF token is automatically picked up from Kaggle Secrets (key: HF_TOKEN).
CLI Reference
lfm-train --help
| Flag | Default | Description |
|---|---|---|
--dataset |
(required) | Dataset path or Hub ID (repeatable) |
--model |
liquid/LFM2.5-1.2B-Base |
Model to fine-tune |
--hf-token |
auto-detect | HuggingFace token |
--hub-repo |
auto | Hub repo to push to |
--epochs |
3 | Training epochs |
--batch-size |
2 | Per-device batch size |
--lr |
2e-4 | Learning rate |
--max-seq-length |
2048 | Max sequence length |
--lora-r |
16 | LoRA rank |
--lora-alpha |
32 | LoRA alpha |
--bf16 |
off | Use bfloat16 |
--simulate-error |
off | Test auto-publish |
How Auto-Publish Works
The training loop is wrapped in an error handler inspired by Unsloth:
- SIGTERM (Kaggle timeout) → saves + pushes immediately, then exits
- CUDA OOM → clears cache, saves + pushes
- KeyboardInterrupt → saves + pushes
- Any Exception → saves + pushes, then re-raises
Each checkpoint is tagged with a UTC timestamp (e.g., v20260302-153000) so versions never collide.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lfm_trainer-0.1.0.tar.gz.
File metadata
- Download URL: lfm_trainer-0.1.0.tar.gz
- Upload date:
- Size: 172.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24aeba7b0f0116dc78bcb41c14d8d82c07b5172adcbcad8ae5aeb05ecd726964
|
|
| MD5 |
829b836b41efe5a317e6c3be72ff6939
|
|
| BLAKE2b-256 |
2e24b8f68131e29b1c1638f84a3dabc026263164d002b5a398b8e43f86680b0e
|
File details
Details for the file lfm_trainer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lfm_trainer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90244f4929846004430e0299bf5f4b2fe93d4bd5d3e03ee8baa1102f29743715
|
|
| MD5 |
a235c8b59a43dedddbb6c34274396107
|
|
| BLAKE2b-256 |
453c39e0498fe47fd44777d2897c905566888591461e181097e9d95b314c30ef
|