Skip to main content

The cli tools for mLoRA system.

Project description

mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters

mLoRA (a.k.a Multi-LoRA Fine-Tune) is an open-source framework designed for efficient fine-tuning of multiple Large Language Models (LLMs) using LoRA and its variants. Key features of mLoRA include:

  • Concurrent fine-tuning of multiple LoRA adapters.

  • Shared base model among multiple LoRA adapters.

  • Efficient pipeline parallelism algorithm.

  • Support for multiple LoRA variant algorithms and various base models.

  • Support for multiple reinforcement learning preference alignment algorithms.

The end-to-end architecture of the mLoRA is shown in the figure:

Quickstart

Firstly, you should clone this repository and install dependencies (or use our image):

# Clone Repository
git clone https://github.com/TUDB-Labs/mLoRA
cd mLoRA
# Install requirements need the Python >= 3.12
pip install .

The mlora_train.py code is a starting point for batch fine-tuning LoRA adapters.

python mlora_train.py \
  --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 \
  --config demo/lora/lora_case_1.yaml

You can check the adapters' configuration in demo folder, there are some configuration regarding the use of different LoRA variants and reinforcement learning preference alignment algorithms.

For further detailed usage information, please use --help option:

python mlora_train.py --help

Quickstart with Docker

mLoRA offers an official Docker image for quick start and development, The image is available on Dockerhub Packages registry.

First, you should pull the latest image (the image also use for development):

docker pull yezhengmaolove/mlora:latest

Deploy and enter a container to run mLoRA:

docker run -itd --runtime nvidia --gpus all \
    -v ~/your_dataset_dir:/dataset \
    -v ~/your_model_dir:/model \
    -p <host_port>:22 \
    --name mlora \
    yezhengmaolove/mlora:latest
# when the container started, use the ssh to login
# the default password is mlora@123
ssh root@localhost -p <host_port>
# pull the latest code and run the mlora
cd /mLoRA
git pull
python mlora_train.py \
  --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 \
  --config demo/lora/lora_case_1.yaml

Deploy as service with Docker

We can deploy mLoAR as a service to continuously receive user requests and perform fine-tuning task.

First, you should pull the latest image (use same image for deploy):

docker pull yezhengmaolove/mlora:latest

Deploy our mLoRA server:

docker run -itd --runtime nvidia --gpus all \
    -v ~/your_dataset_cache_dir:/cache \
    -v ~/your_model_dir:/model \
    -p <host_port>:8000 \
    --name mlora_server \
    -e "BASE_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v0.4" \
    -e "STORAGE_DIR=/cache" \
    yezhengmaolove/mlora:latest /bin/bash /opt/deploy.sh

Once the service is deployed, install and use mlora_cli.py to interact with the server.

# install the client tools
pip install mlora-cli
# use the mlora cli tool to connect to mlora server
mlora_cli
(mLoRA) set port <host_port>
(mLoRA) set host http://<host_ip>
# and enjoy it!!
Step-by-step

Step1. Download the mlora image and install the mlora_cli

docker pull yezhengmaolove/mlora:latest
pip install mlora-cli

asciicast

Step2. Start the mlora server with Docker

# first, we create a cache dir in host for cache some file
mkdir ~/cache
# second, we manually download the model weights from Hugging Face.
mkdir ~/model && cd ~/model
git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
# we map port 8000 used by the mlora server to port 1288 on the host machine.
# the BASE_MODEL environment variable indicates the path of the base model used by mlora.
# the STORAGE_DIR environment variable indicates the path where datasets and lora adapters are stored.
# we use the script /opt/deploy.sh in container to start the server.
docker run -itd --runtime nvidia --gpus all \
    -v ~/cache:/cache \
    -v ~/model:/model \
    -p 1288:8000 \
    --name mlora_server \
    -e "BASE_MODEL=/model/TinyLlama-1.1B-Chat-v1.0" \
    -e "STORAGE_DIR=/cache" \
    yezhengmaolove/mlora:latest /bin/bash /opt/deploy.sh

asciicast

Step3. use mlora_cli tool link to mlora server

we use mlora_cli link to the server http://127.0.0.1:1288 (must use the http protocal)

(mLoRA) set port 1288
(mLoRA) set host http://127.0.0.1

asciicast

Step4. upload some data file for train.

we use the Stanford Alpaca dataset as a demo, the data just like below:

[{"instruction": "", "input": "", "output": }, {...}]
(mLoRA) file upload
? file type: train data
? name: alpaca
? file path: /home/yezhengmao/alpaca-lora/alpaca_data.json

asciicast

Step5. upload some template to provide a structured format for generating prompts

the template in a yaml file, and write by templating language Jinja2, see the demo/prompt.yaml file

the data file you upload can be considered as array data, with the elements in the array being of dictionary type. we consider each element as a data point in the template.

(mLoRA) file upload
? file type: prompt template
? name: simple_prompt
? file path: /home/yezhengmao/mLoRA/demo/prompt.yaml

asciicast

Step6. create a dataset

we create a dataset, the dataset consists of data, a template, and the corresponding prompter. we can use dataset showcase command to check the if the prompts are generated correctly.

(mLoRA) dataset create
? name: alpaca_dataset
? train data file: alpaca
? prompt template file: simple_prompt
? prompter: instruction
? data preprocessing: default
(mLoRA) dataset showcase
? dataset name: alpaca_dataset

asciicast

Step7. create a adapter

now we can use adapter create command to create a adapter for train.

asciicast

Step8. !!!! submit task to train !!!!

Finally, we can submit the task to train our adapter using the defined dataset. NOTE: you can continuously submit or terminal training tasks. use the adapter ls or task ls to check the tasks' status

asciicast

Why you should use mLoRA

Using mLoRA can save significant computational and memory resources when training multiple adapters simultaneously.

High performance on consumer hardware

We fine-tuned multiple LoRA adapters using four A6000 graphics cards with fp32 precision and without using checkpointing and any quantization techniques:

Model mLoRA (tokens/s) PEFT-LoRA with FSDP (tokens/s) PEFT-LoRA with TP (tokens/s)
llama-2-7b (32fp) 2364 1750 1500
llama-2-13b (32fp) 1280 OOM 875

Supported model

Model
LLaMA

Supported LoRA variants

Variant
QLoRA,NIPS,2023
LoRA+,ICML,2024
VeRA,ICLR,2024
DoRA,ICML,2024

Supported preference alignment algorithms

Variant
DPO,NeurIPS,2024
CPO,ICML,2024

Document

Contributing

We welcome contributions to improve this repository! Please review the contribution guidelines before submitting pull requests or issues.

Fork the repository. Create a new branch for your feature or fix. Submit a pull request with a detailed explanation of your changes.

You can use the pre-commit to check your code.

# Install requirements
pip install .[ci_test]
ln -s ../../.github/workflows/pre-commit .git/hooks/pre-commit

Or just call the script to check your code

.github/workflows/pre-commit

Citation

Please cite the repo if you use the code in this repo.

@misc{m-LoRA,
  author = {Zhengmao, Ye\textsuperscript{*} and Dengchun, Li\textsuperscript{*} and Jingqi, Tian and Tingfeng, Lan and Yanbo, Liang and Yexi, Jiang and Jie, Zuo and Hui, Lu and Lei, Duan and Mingjie, Tang},
  title = {m-LoRA: Efficient LLM Model Fine-tune and Inference via Multi-Lora Optimization},
  year = {2023},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/TUDB-Labs/mLoRA}},
  note={\textsuperscript{*}: these authors contributed equally to this work.}
}

Copyright

Copyright © 2024 All Rights Reserved.

This project is licensed under the Apache 2.0 License.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlora_cli-0.2.3.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlora_cli-0.2.3-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file mlora_cli-0.2.3.tar.gz.

File metadata

  • Download URL: mlora_cli-0.2.3.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for mlora_cli-0.2.3.tar.gz
Algorithm Hash digest
SHA256 575384f1c7b88cfa2699fcc8300149da6bef4ab7fbd054bdf3934b6f5987affb
MD5 654c95cf07d9f83f28c184c47d266c86
BLAKE2b-256 3412c95d987df77706faae8f955420fc750623c8fdba428730925478a04c7722

See more details on using hashes here.

File details

Details for the file mlora_cli-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: mlora_cli-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for mlora_cli-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 69e5e94fae51c6d4af6332c376229bead54dbe3ced94f25395606242396b373d
MD5 21622ac7d5ff0989f58f7d8fd928956a
BLAKE2b-256 d18da7accec2d5f942bcb45e41062ad311b8172a0e0a529684598f45ce65588e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page