Library for fine-tuning large language models

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.11

Project description

Welcome to Tuningtron!

tuningtron_left

Tuningtron is a library built on top of Hugging Face Transformers, designed to simplify the process of fine-tuning large language models (LLMs) for developers. It focuses on making LLM fine-tuning feasible even with limited computational resources, such as Nvidia GeForce RTX 3090 GPUs. The library supports training on both GPUs and CPUs, and it includes a feature for offloading model weights to the CPU when using a single GPU. With one Nvidia GeForce RTX 3090 GPU and 256 GB of RAM, Tuningtron can handle fine-tuning models with up to approximately 70 billion parameters.

Environment

The Tuningtron library is compatible with the Ubuntu 22.04 operating system. To set up the required environment for this library, system tools must be installed using the command:

sudo apt install -y python3-pip ccache make cmake g++ mpich

To create a Python virtual environment with a GPU, use the command:

conda env create -f environment.yml

In the absence of a GPU, the environment can be set up with the command:

conda env create -f environment-cpu.yml

These steps ensure that all necessary dependencies are correctly configured, allowing the Tuningtron library to function optimally.

Installation

pip install tuningtron

Updating operating system drivers

The following commands allow you to update operating system drivers:

sudo rm -r /var/lib/dkms/nvidia
sudo dpkg -P --force-all $(dpkg -l | grep "nvidia-" | grep -v lib | awk '{print $2}')
sudo ubuntu-drivers install

Using swap

When fine-tuning models with a large number of parameters, it might be necessary to increase the operating system's swap space. This can be done using the following steps:

sudo swapoff -a
sudo fallocate -l 50G
sudo chmod 600
sudo mkswap /swapfile
sudo swapon /swapfile

These commands will increase the swap space, providing additional virtual memory that can help manage the large memory requirements during model fine-tuning.

Swap should be used only in case of extreme necessity, as it can significantly slow down the training process. To ensure that the system uses swap space minimally, you should add the following line to the /etc/sysctl.conf file: vm.swappiness=1. This setting minimizes the swappiness, making the system less likely to swap processes out of physical memory and thus relying more on RAM, which is much faster than swap space.

Convensions

If a GPU is available, the Tuningtron library automatically leverages DeepSpeed to offload model weights to RAM. This optimization allows for efficient management of memory resources, enabling the fine-tuning of larger models even with limited GPU memory.
The Tuningtron library supports only a specific dataset format, which must include the following columns: "instruct", "input", and "output". These columns are essential for the proper functioning of the library, as they structure the data in a way that the model can interpret and learn from effectively. If the dataset contains a column named "text", the library will use only this column and the data within it as-is.
If the eval=True parameter is passed to the prepare_dataset method, the Tuningtron library will automatically use 10% of the data in the dataset as validation data, creating an evaluation dataset. This feature allows for easy splitting of the dataset, ensuring that a portion of the data is reserved for evaluating the model's performance during training, thereby facilitating better model assessment and tuning.
The Tuningtron library fundamentally avoids using quantization during the fine-tuning process to prevent any potential loss of quality. This approach ensures that the experiments remain straightforward and maintain the highest possible model accuracy.
For combining LoRA adapters, the Tuningtron library supports only the "cat" method. In this method, the LoRA matrices are concatenated, providing a straightforward and effective approach for merging adapters.

Supported Models

The following LLM models are supported:

Cohere Family
Gemma Family
Qwen Family

SFT finetuning example

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import logging
from tuningtron import Tuner

logging.basicConfig(level=logging.INFO)
os.environ["HF_TOKEN"] = "xxx"

tuner = Tuner("google/gemma-2-9b-it")
tuner.sft("equiron-ai/translator_sft", "adapter_gemma_sft", rank=64, batch_size=1, gradient_steps=1, learning_rate=1e-4)

DPO finetuning example

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
import logging
from tuningtron import Tuner

logging.basicConfig(level=logging.INFO)
os.environ["HF_TOKEN"] = "xxx"

tuner = Tuner("./gemma_sft", enable_deepspeed=False)
tuner.dpo("equiron-ai/translator_dpo", "adapter_gemma_dpo", rank=64, batch_size=1, gradient_steps=1, learning_rate=1e-4)

Combining/merging LoRA adapters

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
import logging
from tuningtron import Tuner

logging.basicConfig(level=logging.INFO)

tuner = Tuner("google/gemma-2-9b-it")
tuner.merge("gemma_sft", "adapter_gemma_sft")

Known issues

Model fine-tuning and combining adapters cannot be performed in the same bash script or Jupyter session. It is essential to separate the processes of fine-tuning and adapter merging. When using JupyterLab, you must restart the kernel after completing each of these processes to ensure proper execution and avoid conflicts.

Convert to GGUF

python3 llama.cpp/convert-hf-to-gguf.py /path/to/model --outfile model.gguf --outtype f16
llama.cpp/build/bin/llama-quantize model.gguf model_q5_k_m.gguf q5_k_m

Run with Llama.CPP Server on GPU

llama.cpp/build/bin/llama-server -m model_q5_k_m.gguf -ngl 99 -fa -c 4096 --host 0.0.0.0 --port 8000

Install CUDA toolkit for Llama.cpp compilation

Please note that the toolkit version must match the driver version. The driver version can be found using the nvidia-smi command. Аor example, to install toolkit for CUDA 12.4 you need to run the following commands:

CUDA_TOOLKIT_VERSION=12-4
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt -y install cuda-toolkit-${CUDA_TOOLKIT_VERSION}
echo -e '
export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
' >> ~/.bashrc

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.11

Release history Release notifications | RSS feed

0.0.44

Aug 31, 2025

0.0.42

Mar 2, 2025

0.0.41

Feb 2, 2025

0.0.40

Feb 1, 2025

0.0.39

Feb 1, 2025

0.0.38

Jan 31, 2025

0.0.37

Jan 31, 2025

This version

0.0.36

Jan 30, 2025

0.0.35

Jan 30, 2025

0.0.34

Jan 29, 2025

0.0.33

Jan 16, 2025

0.0.32

Jan 14, 2025

0.0.31

Jan 4, 2025

0.0.30

Jan 3, 2025

0.0.29

Jan 2, 2025

0.0.28

Dec 28, 2024

0.0.26

Dec 25, 2024

0.0.25

Dec 25, 2024

0.0.24

Dec 24, 2024

0.0.23

Dec 23, 2024

0.0.22

Dec 20, 2024

0.0.21

Dec 19, 2024

0.0.20

Dec 17, 2024

0.0.19

Dec 16, 2024

0.0.18

Dec 16, 2024

0.0.17

Dec 14, 2024

0.0.16

Dec 14, 2024

0.0.15

Dec 12, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tuningtron-0.0.36.tar.gz (15.5 kB view details)

Uploaded Jan 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tuningtron-0.0.36-py3-none-any.whl (13.6 kB view details)

Uploaded Jan 30, 2025 Python 3

File details

Details for the file tuningtron-0.0.36.tar.gz.

File metadata

Download URL: tuningtron-0.0.36.tar.gz
Upload date: Jan 30, 2025
Size: 15.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for tuningtron-0.0.36.tar.gz
Algorithm	Hash digest
SHA256	`6a547369d3794cc745f5f2858633a13667009621fb2b5bc3a631ba87a8bd36bb`
MD5	`f593ee461800fe9a68d455c2d2d6cbd5`
BLAKE2b-256	`afb1210c8b564443d8c4452b3a075c3be42c43a069a3433636f99680d0d267c0`

See more details on using hashes here.

File details

Details for the file tuningtron-0.0.36-py3-none-any.whl.

File metadata

Download URL: tuningtron-0.0.36-py3-none-any.whl
Upload date: Jan 30, 2025
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for tuningtron-0.0.36-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a2a4369abae529c9264cc0fc0f468c44991fd70f1d4e301dd5c12491aa7b214`
MD5	`f00494312ae99a3a2d9668ae5d4a2f0b`
BLAKE2b-256	`d1465b5c1d252c390b2cdf7c9da21f45e5b6fc199d43e0c8b0fec49e79e90b24`

See more details on using hashes here.

tuningtron 0.0.36

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Welcome to Tuningtron!

Environment

Installation

Updating operating system drivers

Using swap

Convensions

Supported Models

SFT finetuning example

DPO finetuning example

Combining/merging LoRA adapters

Known issues

Convert to GGUF

Run with Llama.CPP Server on GPU

Install CUDA toolkit for Llama.cpp compilation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes