Fine-tune an LLM with Unsloth

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

local_llm_finetune

Fine-tune an LLM with Unsloth.

Installation

install unsloth with:

conda create --name unsloth_env \
    python=3.11 \
    pytorch-cuda=12.1 \
    pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
    -y
conda activate unsloth_env

pip install unsloth

install the library with pip install local-llm-finetune

Usage

Create training data

The library uses a Pandas DataFrame as the basis of the training data. This dataframe should have two columns, query, and response, corresponding to what you are asking the model and what its response should be. If you don't have your data set up in this way and only have raw documents, you can use the library to get it into this format.

First, convert your documents into .txt format. Then, create a metadata.csv file. This file can contain any columns desired, but must at least have one column named filepath. This column should contain the filenames of the .txt files.

There are two methods for converting the documents to query-response pairs: 1. Synthetic LLM responses 2. Text completion responses

In both cases the documents will be split into chunks as long as specified in the chunk_size parameter.

The first method will query an existing LLM with the prompt format specified in the prompt_format parameter. This string should be in a format like "Metadata: {}, instruction: {}". Where the first brackets will be filled in with the document's metadata, and the second with the chunk's content. This prompt will become the query column in the eventual dataset output. The LLM's response will become the response column.

The second method will split the chunk into two sections, with the ratio determined by the completion_ratio parameter. E.g., if the parameter is 0.75, 75% of the chunk will go into the prompt, and the remaining 25% will go in as the response. Below is a code example of both methods.

from local_llm_finetune.data_prep import process_data
import pandas as pd

metadata = pd.read_csv("path/metadata.csv")
files_path = "path_to_txt_files/"

# LLM dataset
dataset = process_data(
    metadata=metadata,
    files_path=files_path,
    chunk_size=500,
    chunk_overlap=150,
    prompt_format="This is an excerpt from the document with the the following metadata: {}. It is currently about 500 words long. Summarize the information to about 100 words. Here is the excerpt:\n\n'{}'",
    llm_url=llm_url, # this should be the url of the LLM to use, for instance, 'http://localhost:8081/v1/' for a llama.cpp local LLM, or 'https://generativelanguage.googleapis.com/v1beta/openai/' for google
    llm_model=model_name, # can be anything for a local LLM, for a cloud provider, the name of the model, e.g., 'gemini-2.0-flash'
    api_Key=api_key, # api key for the cloud provider, 'sk-no-key-required' for a local llama.cpp server
)

# text completion dataset
dataset = process_data(
    metadata=metadata,
    files_path=files_path,
    chunk_size=1000,
    chunk_overlap=200,
    prompt_format="This is an excerpt from the document with the the following metadata: {}. It is currently about 750 words long. Complete the excerpt with a new text about one third as long. Here is the excerpt: '{}'",
    completion_ratio=0.75, # what % of each chunk to have as the question, 1-ratio = what proportion to have as the answer
)

Fine-tuning a model

from local_llm_finetune.modelling import format_training_data, initialize_model, save_model, setup_training, train

# set up the model training
unsloth_model, tokenizer, max_seq_length = initialize_model(
    base_model_name="unsloth/Llama-3.1-8B-Instruct-bnb-4bit", # set to whatever base model to finetune on HuggingFace
    max_seq_length=2048,
)

# train the model
train_dataset = format_training_data(
    tokenizer,
    dataset,
    input_split=" Here is the excerpt:", # how to split the question into 'Instruction' and 'Input'
)

trainer = setup_training(
    unsloth_model,
    tokenizer,
    train_dataset,
    max_seq_length=2048,
    output_dir="outputs",
)

trainer_stats = train(trainer)

# save the output to GGUF
save_model(
    unsloth_model, tokenizer, quantization_method="q5_k_m", output_dir="trained_model"
)

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.2

Mar 27, 2025

0.0.1

May 10, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

local_llm_finetune-0.0.2.tar.gz (6.4 kB view details)

Uploaded Mar 27, 2025 Source

File details

Details for the file local_llm_finetune-0.0.2.tar.gz.

File metadata

Download URL: local_llm_finetune-0.0.2.tar.gz
Upload date: Mar 27, 2025
Size: 6.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for local_llm_finetune-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`f950778f861f9fa65f63113548ee0084c39db16612baf3b3d2fb3be7db6b0c16`
MD5	`9862eb8c4e643bfcb661d8b84f68c889`
BLAKE2b-256	`3e97a36a1a13deb4fee7dd64206b9cae115dc59e9db2910d1ea25110aa85db6f`

See more details on using hashes here.

local-llm-finetune 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

local_llm_finetune

Installation

Usage

Create training data

Fine-tuning a model

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes