Skip to main content

Fine-tune LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU with auto-publish to Hugging Face

Project description

LFM Trainer

Fine-tune Liquid LFM 2.5 1.2B for coding tasks on Kaggle multi-GPU — with automatic checkpoint publishing to Hugging Face on errors.

Features

  • 🚀 Multi-GPU training via HuggingFace Accelerate / DDP
  • 🧠 LoRA / PEFT for memory-efficient fine-tuning on Kaggle T4s
  • 📦 Structured dataset loading — CSV, Parquet, JSONL, or HuggingFace Hub IDs — with auto-format detection
  • 🛡️ Error-resilient training — auto-publishes versioned checkpoints to Hugging Face on OOM, SIGTERM (Kaggle timeout), KeyboardInterrupt, or any exception
  • 🔑 Flexible HF auth — CLI arg, HF_TOKEN env var, or Kaggle Secrets

Installation

pip install lfm-trainer

Quick Start (Kaggle Notebook)

# In a Kaggle cell:
!pip install lfm-trainer

# Train on a single dataset
!lfm-train --dataset iamtarun/python_code_instructions_18k_alpaca --hub-repo your-username/lfm-code

# Train on multiple datasets
!lfm-train \
    --dataset code_data.csv \
    --dataset more_code.parquet \
    --dataset sahil2801/CodeAlpaca-20k \
    --hub-repo your-username/lfm-code \
    --epochs 3 \
    --batch-size 2

The HF token is automatically picked up from Kaggle Secrets (key: HF_TOKEN).

CLI Reference

lfm-train --help
Flag Default Description
--dataset (required) Dataset path or Hub ID (repeatable)
--model liquid/LFM2.5-1.2B-Base Model to fine-tune
--hf-token auto-detect HuggingFace token
--hub-repo auto Hub repo to push to
--epochs 3 Training epochs
--batch-size 2 Per-device batch size
--lr 2e-4 Learning rate
--max-seq-length 2048 Max sequence length
--lora-r 16 LoRA rank
--lora-alpha 32 LoRA alpha
--bf16 off Use bfloat16
--simulate-error off Test auto-publish

How Auto-Publish Works

The training loop is wrapped in an error handler inspired by Unsloth:

  1. SIGTERM (Kaggle timeout) → saves + pushes immediately, then exits
  2. CUDA OOM → clears cache, saves + pushes
  3. KeyboardInterrupt → saves + pushes
  4. Any Exception → saves + pushes, then re-raises

Each checkpoint is tagged with a UTC timestamp (e.g., v20260302-153000) so versions never collide.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lfm_trainer-0.1.0.tar.gz (172.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lfm_trainer-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file lfm_trainer-0.1.0.tar.gz.

File metadata

  • Download URL: lfm_trainer-0.1.0.tar.gz
  • Upload date:
  • Size: 172.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lfm_trainer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 24aeba7b0f0116dc78bcb41c14d8d82c07b5172adcbcad8ae5aeb05ecd726964
MD5 829b836b41efe5a317e6c3be72ff6939
BLAKE2b-256 2e24b8f68131e29b1c1638f84a3dabc026263164d002b5a398b8e43f86680b0e

See more details on using hashes here.

File details

Details for the file lfm_trainer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lfm_trainer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lfm_trainer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90244f4929846004430e0299bf5f4b2fe93d4bd5d3e03ee8baa1102f29743715
MD5 a235c8b59a43dedddbb6c34274396107
BLAKE2b-256 453c39e0498fe47fd44777d2897c905566888591461e181097e9d95b314c30ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page