Skip to main content

Fine-tune LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU with auto-publish to Hugging Face

Project description

LFM Trainer

Fine-tune Liquid LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU — with automatic checkpoint publishing to Hugging Face on errors.

Features

  • 🚀 Multi-GPU training via HuggingFace Accelerate / DDP
  • 🧠 LoRA / PEFT for memory-efficient fine-tuning on Kaggle T4s
  • 📦 Structured dataset loading — CSV, Parquet, JSONL, or HuggingFace Hub IDs — with auto-format detection
  • 🛡️ Error-resilient training — auto-publishes versioned checkpoints to Hugging Face on OOM, SIGTERM (Kaggle timeout), KeyboardInterrupt, or any exception
  • 🔑 Flexible HF auth — CLI arg, HF_TOKEN env var, or Kaggle Secrets

Installation

pip install lfm-trainer

Quick Start (Kaggle Notebook)

# In a Kaggle cell:
!pip install lfm-trainer

# Train on a single dataset
!lfm-train --dataset iamtarun/python_code_instructions_18k_alpaca --hub-repo your-username/lfm-code

# Train on multiple datasets
!lfm-train \
    --dataset code_data.csv \
    --dataset more_code.parquet \
    --dataset sahil2801/CodeAlpaca-20k \
    --hub-repo your-username/lfm-code \
    --epochs 3 \
    --batch-size 2

The HF token is automatically picked up from Kaggle Secrets (key: HF_TOKEN).

CLI Reference

lfm-train --help
Flag Default Description
--dataset (required) Dataset path or Hub ID (repeatable)
--model liquid/LFM2.5-1.2B-Base Model to fine-tune
--hf-token auto-detect HuggingFace token
--hub-repo auto Hub repo to push to
--epochs 3 Training epochs
--batch-size 2 Per-device batch size
--lr 2e-4 Learning rate
--max-seq-length 2048 Max sequence length
--lora-r 16 LoRA rank
--lora-alpha 32 LoRA alpha
--bf16 off Use bfloat16
--simulate-error off Test auto-publish

How Auto-Publish Works

The training loop is wrapped in an error handler inspired by Unsloth:

  1. SIGTERM (Kaggle timeout) → saves + pushes immediately, then exits
  2. CUDA OOM → clears cache, saves + pushes
  3. KeyboardInterrupt → saves + pushes
  4. Any Exception → saves + pushes, then re-raises

Each checkpoint is tagged with a UTC timestamp (e.g., v20260302-153000) so versions never collide.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lfm_trainer-0.2.0.tar.gz (175.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lfm_trainer-0.2.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file lfm_trainer-0.2.0.tar.gz.

File metadata

  • Download URL: lfm_trainer-0.2.0.tar.gz
  • Upload date:
  • Size: 175.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lfm_trainer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0f1faef7b32afb7d5f84c8b6451e511c53b9ddc606618f6d05dde8d139629d12
MD5 2d965a2234676e2bd7d55267145b5423
BLAKE2b-256 665861090115787e74699d1b327f03049b743c5a725b47fab5364fda87d7a0c1

See more details on using hashes here.

File details

Details for the file lfm_trainer-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: lfm_trainer-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lfm_trainer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d818ad2346812dc1c6b5037172d98790ac9eb17627a3a421c14ba15dca06f309
MD5 d030662aa80b15afcac01fca960d3479
BLAKE2b-256 7c91c3e56a4ebcc7d3aba6111bf6bda891982badf257753ffe3de9e58dde50d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page