Skip to main content

Extract datasets from models and train slimmer LoRAs on them

Project description

unfat

Automates training small, slim Llama 3.1-based LoRAs with known-good configs for up to 8192 tokens, so you don't have to think about any of the system-level details of model training and can focus on curating good datasets and selecting training parameters (instead of experimenting with batch sizes and gradient accumulation steps just trying to get your training job to run). Automatically handles multi-GPU training for you when necessary!

Includes helpers for:

  • Extracting distillation data from existing models
  • Pulling training data from Hugging Face datasets and/or JSONL files
  • Training models with known-good configurations on your own GPUs, or on Together.ai's cloud-hosted finetuning platform
  • Tracking training and eval progress on Weights & Biases

Why LoRAs?

LoRAs are fast and cheap to train, and result in tiny files that can efficiently be kept in VRAM, while still significantly improving task performance compared to the underlying base model. For example, this R1 distill LoRA of Llama 3.1 70B Instruct improves MATH-500 and GPQA-Diamond performance by 50%, and doubles AIME24 performance, compared to the untrained model. Sites like GLHF support running arbitrary LoRAs of certain base models at cheap per-token prices that are equivalent to the underlying base models — typically this is a lot cheaper than renting out enough GPUs to run a full-parameter finetune.

You can do much more than just improving at benchmarks, though; you can modify models pretty much however you want. For example, this 70b LoRA uncensors Llama 3.1 70B by distilling from a larger uncensored model, something that isn't possible with prompt engineering alone.

Table of Contents:

Extracting distillation data

Let's train a quick Llama 3.1 8B Instruct LoRA by distilling DeepSeek-R1. First, we'll get some datasets and extract completions from R1 by querying the glhf.chat API:

from unfat.datasets import hub_prompts, hub_subsets, HubSplit, Dataset, HubSubset
from unfat.extract import Extractor
from unfat.client import OpenAiCompatClient
import os

output_dir = "output"
extractor = Extractor(
    max_concurrent=30,
    output_dir=output_dir,
    client=OpenAiCompatClient(
        model="hf:deepseek-ai/DeepSeek-R1",
        base_url="https://glhf.chat/api/openai/v1",
        api_key=os.environ["GLHF_API_KEY"],
    ),
    dataset=Dataset(
        train=[
            # Use some simple chat messages to extract prompts that need less
            # thinking:
            hub_prompts(
                name="mlabonne/harmless_alpaca",
                text_field="text",
                split=HubSplit(name="train", max_rows=100),
            ),
            # Use a few rows of each subset of the train set of hendrycks_math
            # to extract harder prompts:
            hub_subsets(
                name="EleutherAI/hendrycks_math",
                text_field="problem",
                subsets=[
                    HubSubset(
                        name="geometry",
                        split=HubSplit(name="train", max_rows=30),
                    ),
                    HubSubset(
                        name="intermediate_algebra",
                        split=HubSplit(name="train", max_rows=30),
                    ),
                    HubSubset(
                        name="number_theory",
                        split=HubSplit(name="train", max_rows=30),
                    ),
                    HubSubset(
                        name="precalculus",
                        split=HubSplit("train", max_rows=30),
                    ),
                ],
            ),
        ],
        eval=[
            # Test on the test sets
            hub_prompts(
                name="mlabonne/harmless_alpaca",
                text_field="text",
                split=HubSplit(name="test", max_rows=10),
            ),
            hub_subsets(
                name="EleutherAI/hendrycks_math",
                text_field="problem",
                subsets=[
                    HubSubset(
                        name="geometry",
                        split=HubSplit(name="test", max_rows=30),
                    ),
                    HubSubset(
                        name="intermediate_algebra",
                        split=HubSplit(name="test", max_rows=30),
                    ),
                    HubSubset(
                        name="number_theory",
                        split=HubSplit(name="test", max_rows=30),
                    ),
                    HubSubset(
                        name="precalculus",
                        split=HubSplit("test", max_rows=30),
                    ),
                ],
            ),
        ],
    ),
)

Next, let's run the extraction. This should take around 10mins and cost around $7 in API credits:

extractor.run()

Now you should have all the data you need for training. Unfat can generate training jobs for you in two ways:

  1. By generating Axolotl configs you can run on A100s/H100s, or
  2. By creating jobs on Together.ai's fine-tuning platform.

If you have your own A100/H100 GPUs, we recommend using Axolotl. Otherwise, we recommend running the jobs on Together.ai for simplicity.

Finetune using Axolotl

Axolotl is an open-source fine-tuning framework. Unfat can automatically generate Axolotl training configs for you by making some assumptions:

  • For Llama 3.1 8B finetunes, we assume one H100/A100 GPU is being used.
  • For Llama 3.1 70B finetunes, we assume 8xH100s or 8xA100s.

If you don't have machines of this size yourself, we recommend using Runpod to rent them.

To generate the configs:

from unfat.axolotl import llama_3_1_8b_axolotl
from unfat.lora import LoraSettings

lora_settings = LoraSettings(
    rank=32,
    alpha=16,
    dropout=0.01,
    num_epochs=8,
    learning_rate=4e-4,
)
train_config = llama_8b_axolotl(
    dataset=extractor.output_dataset(),
    settings=lora_settings,
    warmup_steps=10,
)

train_config.save(output_dir)

Now you should have a config.yaml in your output/ directory. Once you've installed and setup Axolotl according to its setup guide, simply run:

axolotl train ./output/config.yaml

Finetune using Together.ai

If you don't want to manage GPUs yourself, Unfat supports automatically uploading and starting jobs on Together.ai's finetuning platform. First, create an account and export a TOGETHER_API_KEY in your shell environment. Then you can simply do as follows:

from unfat.together import llama_8b_together
from unfat.lora import LoraSettings

train_config = llama_8b_together(
    output_dir=output_dir,
    dataset=extractor.output_dataset(),
    settings=LoraSettings(
        rank=32,
        alpha=16,
        dropout=0.01,
        num_epochs=8,
        learning_rate=4e-4,
    ),
    api_key=os.environ["TOGETHER_API_KEY"],
)
uploaded_files = together_config.upload_files()
together_config.finetune(uploaded_files)

This should take around 10mins and cost around $6 in credits.

Once it's done, you can log into your Together account and download the final LoRA checkpoint. Together (unfortunately) generates an invalid adapter_config.json: it sets base_model_name_or_path to an internally-hosted model rather than the actual base model; make sure to rewrite that to "meta-llama/Meta-Llama-3.1-8B-Instruct" before publishing or pushing to Hugging Face.

Run on GLHF

Push your model to Hugging Face, and then copy+paste the link to your Hugging Face repo into GLHF. That's it!

Run locally with Ollama

First, you'll need to convert the LoRA to GGUF using llama.cpp. Clone the repo and install its dependencies:

git clone git@github.com:ggml-org/llama.cpp.git
cd llama.cpp

# Install Python deps
python -m venv llamacpp
source llamacpp/bin/activate
python -m pip install -r requirements.txt

Then, convert the LoRA adapter to GGUF:

python convert-lora-to-gguf ./path-to-your-lora-directory

Next, create an Ollama Modelfile file with the following contents:

FROM modelname:version # for example: llama-3.1:8b
ADAPTER ./path-to-gguf-file

Finally, register your new model locally:

ollama create your-model-name -f ./Modelfile

Finally, run:

ollama serve

To serve your API.

Training on your own JSONL files

You don't just need to distill from larger models! You can also train on local JSONL-formatted files. Each line should be a JSON object of the following form:

{ messages: Array<{ role: "user" | "assistant", content: string }> }

The model will learn to produce the assistant messages. To train on JSONL files, use the following:

from unfat.datasets import JsonlConvos
dataset = Dataset(
  train=[
    JsonlConvos(path="./path/to/jsonl/file.jsonl"),
  ]
)

Datasets can be merged, so if you have some distillation data and a local JSON file, you could do something like:

dataset = extractor.output_dataset().merge(Dataset(
  train=[
    JsonlConvos(path="./path/to/jsonl/file.jsonl"),
  ],
))

Training on Hugging Face datasets

You can also train on datasets from the Hugging Face hub. We expose two kinds of Hugging Face datasets: instruction-formatted datasets, and conversation-formatted datasets. For instruction-formatted datasets, use:

from unfat.datasets import HubInstructConvos

dataset = HubInstructConvos(
  name="vicgalle/alpaca-gpt4",
  splits=["train"],

  instruction_field="instruction", # optional -- this is the default
  input_field="input", # optional -- this is the default
  output_field="output", # optional -- this is the default
)

The model will learn to give the output when prompted with the instruction + input fields.

You can also use conversational Hugging Face datasets like so:

from unfat.datasets import HubMessageConvos

dataset = HubMessageConvos(
  name="cgato/SlimOrcaDedupCleaned",
  splits=["train"],
  messages_field="conversations", # optional -- the default is "messages"
  role_field="from", # optional -- the default is "role"
  content_field="value", # optional -- the default is "content"
  user_role="human", # optional -- the default is "user"
  assistant_role="gpt", # optional -- the default is "assistant"
  system_role="system", # optional -- this is the default
)

Tracking with Weights & Biases

The LoraSettings dataclass can take a W&B project name and API key:

lora_settings = LoraSettings(
    rank=32,
    alpha=16,
    dropout=0.01,
    num_epochs=8,
    learning_rate=4e-4,
    wandb_project="r1-8b-distill",
    wandb_api_key=os.environ["WANDB_API_KEY"],
)

The wandb_api_key will be automatically used by the Together finetuner, but for the Axolotl trainer, you'll have to make sure to export a WANDB_API_KEY environment variable wherever you run the Axolotl config.

Anthropic-compatible clients

Unfat also supports distilling from Anthropic-compatible APIs. Instead of using the OpenAiCompatClient, use the AnthropicCompatClient:

AnthropicCompatClient(
    model="claude-3-7-sonnet-20250219",
    max_tokens=4096,
    thinking_budget=2048,
    api_key=os.environ["ANTHROPIC_API_KEY"],
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unfat-0.0.5.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unfat-0.0.5-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file unfat-0.0.5.tar.gz.

File metadata

  • Download URL: unfat-0.0.5.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.1

File hashes

Hashes for unfat-0.0.5.tar.gz
Algorithm Hash digest
SHA256 93a4813bd44969b375fd91a37144f7b88c2ec55121f096633937f3df9febc49e
MD5 9cb5ad967c7fee4aab34f23784b69939
BLAKE2b-256 44fafb4a2e2f970627b915c0f977414133177077c25215ac4de1d21c1ffb38be

See more details on using hashes here.

File details

Details for the file unfat-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: unfat-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.1

File hashes

Hashes for unfat-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 68d014deefe0f75a0017fe87afb80196c0f09c1f60ded3d7103113176b8cb9ba
MD5 6f15cb74f924d44356820fde17fcc348
BLAKE2b-256 2bb583ac521be61d529a9e7e4f3e481d0e67b1132dfb1251aad44494868c0ad2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page