Skip to main content

A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.

Project description

Green-Bit-LLM

A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's low-bit LLMs.

Introduction

This Python package uses the Bitorch Engine for efficient operations on GreenBitAI's Low-bit Language Models (LLMs). It enables high-performance inference on both cloud-based and consumer-level GPUs, and supports full-parameter fine-tuning directly using quantized LLMs. Additionally, you can use our provided evaluation tools to validate the model's performance on mainstream benchmark datasets.

News

  • [2024/04]
    • We have launched over 200 low-bit LLMs in GreenBitAI's Hugging Face Model Zoo. Our release includes highly precise 2.2/2.5/3-bit models across the LLM family, featuring LLaMA 2/3, 01-Yi, Qwen, Mistral, Phi-3, Gemma, and more.
    • We released Bitorch Engine for low-bit quantized neural network operations. Our release support full parameter fine-tuning and parameter efficiency fine-tuning (PEFT), even under extremely constrained GPU resource conditions.
    • We released gbx-lm python package which enables the efficient execution of GreenBitAI's low-bit models on Apple devices with MLX.

LLMs

Family Bpw Size HF collection_id
Llama-3 4.0/3.0/2.5/2.2 8B/70B GreenBitAI Llama-3
Llama-2 3.0/2.5/2.2 7B/13B/70B GreenBitAI Llama-2
Qwen-1.5 4.0/3.0/2.5/2.2 0.5B/1.8B/4B/7B/14B/32B/110B GreenBitAI Qwen 1.5
Phi-3 3.0/2.5/2.2 mini GreenBitAI Phi-3
Mistral 3.0/2.5/2.2 7B GreenBitAI Mistral
01-Yi 3.0/2.5/2.2 6B/34B GreenBitAI 01-Yi
Llama-3-instruct 4.0/3.0/2.5/2.2 8B/70B GreenBitAI Llama-3
Mistral-instruct 3.0/2.5/2.2 7B GreenBitAI Mistral
Phi-3-instruct 3.0/2.5/2.2 mini GreenBitAI Phi-3
Qwen-1.5-Chat 4.0/3.0/2.5/2.2 0.5B/1.8B/4B/7B/14B/32B/110B GreenBitAI Qwen 1.5
01-Yi-Chat 3.0/2.5/2.2 6B/34B GreenBitAI 01-Yi

Demo

Full parameter fine-tuning of the LLaMA-3 8B model using a single GTX 3090 GPU with 24GB of graphics memory:

PEFT of the 01-Yi 34B model using a single GTX 3090 GPU with 24GB of graphics memory:

Installation

We support several ways to install this package. Except for the docker method, you should first install Bitorch Engine according to the official instructions.

Then choose how you want to install it:

Using Pip

pip install green-bit-llm

From source

Clone the repository and install the required dependencies (for Python >= 3.9):

git clone https://github.com/GreenBitAI/green-bit-llm.git
pip install -r requirements.txt

Afterward, install Flash Attention (flash-attn) according to their official instructions.

Conda

Alternatively, you can also use the prepared conda environment configuration:

conda env create -f environment.yml
conda activate gbai_cuda_lm

Afterward, install Flash Attention (flash-attn) according to their official instructions.

Alternatively you can activate an existing conda environment and install the requirements with pip (as shown in the previous section).

Docker

To use docker, you can also use the provided Dockerfile which extends the bitorch-engine docker image. Build the bitorch-engine image first, then run the following commands:

cd docker
cp -f ../requirements.txt .
docker build -t gbai/green-bit-llm .
docker run -it --rm --gpus all gbai/green-bit-llm

Check the docker readme for options and more details.

Usage

Inference

Please see the description of the Inference package for details.

Evaluation

Please see the description of the Evaluation package for details.

sft

Please see the description of the sft package for details.

Requirements

  • Python 3.x
  • Bitorch Engine
  • See requirements.txt or environment.yml for a complete list of dependencies

Examples

Simple Generation

Run the simple generation script as follows:

CUDA_VISIBLE_DEVICES=0 python -m green_bit_llm.inference.sim_gen --model GreenBitAI/Qwen-1.5-1.8B-layer-mix-bpw-3.0 --max-tokens 100 --use-flash-attention-2 --ignore-chat-template

PPL Evaluation

CUDA_VISIBLE_DEVICES=0 python -m green_bit_llm.evaluation.evaluate --model GreenBitAI/Qwen-1.5-4B-layer-mix-bpw-3.0 --trust-remote-code --eval-ppl --ppl-tasks wikitext2,c4_new,ptb

Full-parameter fine-tuning

Run the script as follows to fine-tune the quantized weights of the model on the target dataset. The '--tune-qweight-only' parameter determines whether to fine-tune only the quantized weights or all weights, including non-quantized ones.

CUDA_VISIBLE_DEVICES=0 python -m green_bit_llm.sft.finetune --model GreenBitAI/Qwen-1.5-1.8B-layer-mix-bpw-3.0 --dataset tatsu-lab/alpaca --optimizer DiodeMix --tune-qweight-only

Parameter efficient fine-tuning

CUDA_VISIBLE_DEVICES=0 python -m green_bit_llm.sft.peft_lora --model GreenBitAI/Qwen-1.5-1.8B-layer-mix-bpw-3.0 --dataset tatsu-lab/alpaca --lr-fp 1e-6

License

We release our codes under the Apache 2.0 License. Additionally, three packages are also partly based on third-party open-source codes. For detailed information, please refer to the description pages of the sub-projects.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

green-bit-llm-0.1.0.tar.gz (51.7 kB view details)

Uploaded Source

Built Distribution

green_bit_llm-0.1.0-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file green-bit-llm-0.1.0.tar.gz.

File metadata

  • Download URL: green-bit-llm-0.1.0.tar.gz
  • Upload date:
  • Size: 51.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for green-bit-llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6f67385172d8a524d4e22175f84f5a6b61028ff0114d320d89a1e4929f58b771
MD5 436da48ab7cc53f2f12567b9c899ddb7
BLAKE2b-256 3919ce864ed12537aca134426c611badbfdad51ea4cc061d5e78444fe7a826bf

See more details on using hashes here.

Provenance

File details

Details for the file green_bit_llm-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for green_bit_llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b92ce619d6694e68fe98a8e51eef2b3ecb65fb7f83f1cec6a7c25ca52558bc02
MD5 f7d233dafd8d70e17108d12abe370002
BLAKE2b-256 b2264c685b0c7678afcd6d4025add471d50aebd01e4ceeae92d3986eb38764b4

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page