Easy-to-use LLM fine-tuning framework

These details have not been verified by PyPI

Project links

Homepage

Project description

LLaMA Factory: Training and Evaluating Large Language Models with Minimal Effort

👋 Join our WeChat.

[ English | 中文 ]

LLaMA Board: A One-stop Web UI for Getting Started with LLaMA Factory

Launch LLaMA Board via CUDA_VISIBLE_DEVICES=0 python src/train_web.py. (multiple GPUs are not supported yet)

Here is an example of altering the self-cognition of an instruction-tuned language model within 10 minutes on a single GPU.

https://github.com/hiyouga/LLaMA-Factory/assets/16256802/6ba60acc-e2e2-4bec-b846-2d88920d5ba1

Changelog

[23/10/21] We supported NEFTune trick for fine-tuning. Try --neft_alpha argument to activate NEFTune, e.g., --neft_alpha 5.

[23/09/27] We supported $S^2$-Attn proposed by LongLoRA for the LLaMA models. Try --shift_attn argument to enable shift short attention.

[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See this example to evaluate your models.

[23/09/10] We supported using FlashAttention-2 for the LLaMA models. Try --flash_attn argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs.

[23/08/12] We supported RoPE scaling to extend the context length of the LLaMA models. Try --rope_scaling linear argument in training and --rope_scaling dynamic argument at inference to extrapolate the position embeddings.

[23/08/11] We supported DPO training for instruction-tuned models. See this example to train your models.

[23/07/31] We supported dataset streaming. Try --streaming and --max_steps 10000 arguments to load your dataset in streaming mode.

[23/07/29] We released two instruction-tuned 13B models at Hugging Face. See these Hugging Face Repos (LLaMA-2 / Baichuan) for details.

[23/07/18] We developed an all-in-one Web UI for training, evaluation and inference. Try train_web.py to fine-tune models in your Web browser. Thank @KanadeSiina and @codemayq for their efforts in the development.

[23/07/09] We released FastEdit ⚡🩹, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow FastEdit if you are interested.

[23/06/29] We provided a reproducible example of training a chat model using instruction-following datasets, see Baichuan-7B-sft for details.

[23/06/22] We aligned the demo API with the OpenAI's format where you can insert the fine-tuned model in arbitrary ChatGPT-based applications.

[23/06/03] We supported quantized training and inference (aka QLoRA). Try --quantization_bit 4/8 argument to work with quantized models.

Supported Models

Model	Model size	Default module	Template
Baichuan	7B/13B	W_pack	baichuan
Baichuan2	7B/13B	W_pack	baichuan2
BLOOM	560M/1.1B/1.7B/3B/7.1B/176B	query_key_value	-
BLOOMZ	560M/1.1B/1.7B/3B/7.1B/176B	query_key_value	-
ChatGLM3	6B	query_key_value	chatglm3
Falcon	7B/40B/180B	query_key_value	-
InternLM	7B/20B	q_proj,v_proj	intern
LLaMA	7B/13B/33B/65B	q_proj,v_proj	-
LLaMA-2	7B/13B/70B	q_proj,v_proj	llama2
Mistral	7B	q_proj,v_proj	mistral
Phi-1.5	1.3B	Wqkv	-
Qwen	7B/14B	c_attn	qwen
XVERSE	7B/13B/65B	q_proj,v_proj	xverse

[!NOTE] Default module is used for the --lora_target argument, you can use --lora_target all to specify all the available modules.

For the "base" models, the --template argument can be chosen from default, alpaca, vicuna etc. But make sure to use the corresponding template for the "chat" models.

Please refer to template.py for a full list of models we supported.

Supported Training Approaches

Approach	Full-parameter	Partial-parameter	LoRA	QLoRA
Pre-Training	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
Supervised Fine-Tuning	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
Reward Modeling			:white_check_mark:	:white_check_mark:
PPO Training			:white_check_mark:	:white_check_mark:
DPO Training	:white_check_mark:		:white_check_mark:	:white_check_mark:

[!NOTE] Use --quantization_bit 4/8 argument to enable QLoRA.

Provided Datasets

Pre-training datasets

Supervised fine-tuning datasets

Preference datasets

Please refer to data/README.md for details.

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Requirement

Python 3.8+ and PyTorch 1.13.1+
🤗Transformers, Datasets, Accelerate, PEFT and TRL
sentencepiece, protobuf and tiktoken
fire, jieba, rouge-chinese and nltk (used at evaluation and predict)
gradio and matplotlib (used in web UI)
uvicorn, fastapi and sse-starlette (used in API)

And powerful GPUs!

Getting Started

Data Preparation (optional)

Please refer to data/README.md for checking the details about the format of dataset files. You can either use a single .json file or a dataset loading script with multiple files to create a custom dataset.

[!NOTE] Please update data/dataset_info.json to use your custom dataset. About the format of this file, please refer to data/README.md.

Dependence Installation (optional)

git clone https://github.com/hiyouga/LLaMA-Factory.git
conda create -n llama_factory python=3.10
conda activate llama_factory
cd LLaMA-Factory
pip install -r requirements.txt

If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you will be required to install a pre-built version of bitsandbytes library, which supports CUDA 11.1 to 12.1.

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl

Train on a single GPU

[!IMPORTANT] If you want to train models on multiple GPUs, please refer to Distributed Training.

Pre-Training

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage pt \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset wiki_demo \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir path_to_pt_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

Supervised Fine-Tuning

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir path_to_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

Reward Modeling

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage rm \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset comparison_gpt4_en \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --resume_lora_training False \
    --checkpoint_dir path_to_sft_checkpoint \
    --output_dir path_to_rm_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-6 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

PPO Training

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage ppo \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --resume_lora_training False \
    --checkpoint_dir path_to_sft_checkpoint \
    --reward_model path_to_rm_checkpoint \
    --output_dir path_to_ppo_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

DPO Training

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage dpo \
    --model_name_or_path path_to_llama_model \
    --do_train \
    --dataset comparison_gpt4_en \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --resume_lora_training False \
    --checkpoint_dir path_to_sft_checkpoint \
    --output_dir path_to_dpo_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

Distributed Training

Use Huggingface Accelerate

accelerate config # configure the environment
accelerate launch src/train_bash.py # arguments (same as above)

Example config for LoRA training

compute_environment: LOCAL_MACHINE
distributed_type: MULTI_GPU
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Use DeepSpeed

deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \
    --deepspeed ds_config.json \
    ... # arguments (same as above)

Example config for full-parameter training with DeepSpeed ZeRO-2

{
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "zero_allow_untested_optimizer": true,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "initial_scale_power": 16,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },  
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "overlap_comm": false,
    "contiguous_gradients": true
  }
}

Export model

python src/export_model.py \
    --model_name_or_path path_to_llama_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --export_dir path_to_export

API Demo

python src/api_demo.py \
    --model_name_or_path path_to_llama_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

[!NOTE] Visit http://localhost:8000/docs for API documentation.

CLI Demo

python src/cli_demo.py \
    --model_name_or_path path_to_llama_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Web Demo

python src/web_demo.py \
    --model_name_or_path path_to_llama_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Evaluation

CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \
    --model_name_or_path path_to_llama_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --template vanilla \
    --task mmlu \
    --split test \
    --lang en \
    --n_shot 5 \
    --batch_size 4

Predict

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_llama_model \
    --do_predict \
    --dataset alpaca_gpt4_en \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_predict_result \
    --per_device_eval_batch_size 8 \
    --max_samples 100 \
    --predict_with_generate

[!NOTE] We recommend using --per_device_eval_batch_size=1 and --max_target_length 128 at 4/8-bit predict.

Projects using LLaMA Factory

StarWhisper: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B.
DISC-LawLLM: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge.
Sunsimiao: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B.
CareGPT: A series of large language models for Chinese medical domain, based on LLaMA2-7B and Baichuan-13B.

License

This repository is licensed under the Apache-2.0 License.

Please follow the model licenses to use the corresponding model weights: Baichuan / Baichuan2 / BLOOM / ChatGLM3 / Falcon / InternLM / LLaMA / LLaMA-2 / Mistral / Phi-1.5 / Qwen / XVERSE

Citation

If this work is helpful, please kindly cite as:

@Misc{llama-factory,
  title = {LLaMA Factory},
  author = {hiyouga},
  howpublished = {\url{https://github.com/hiyouga/LLaMA-Factory}},
  year = {2023}
}

Acknowledgement

This repo benefits from PEFT, QLoRA and FastChat. Thanks for their wonderful works.

Star History

Star History Chart

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.7.1

May 15, 2024

0.7.0

Apr 27, 2024

0.6.3

Apr 21, 2024

0.6.2

Apr 11, 2024

0.6.1

Mar 29, 2024

0.6.0

Mar 25, 2024

0.5.3

Feb 28, 2024

0.5.2

Feb 20, 2024

0.5.1

Jan 21, 2024

0.5.0

Jan 20, 2024

0.4.0

Dec 16, 2023

0.3.3

Dec 3, 2023

0.3.2

Nov 20, 2023

0.3.1

Nov 16, 2023

0.3.0

Nov 16, 2023

0.2.3

Nov 15, 2023

0.2.2

Nov 13, 2023

This version

0.2.1

Nov 9, 2023

0.2.0

Oct 15, 2023

0.1.8

Sep 11, 2023

0.1.7

Aug 18, 2023

0.1.6

Aug 11, 2023

0.1.5

Aug 2, 2023

0.1.4

Aug 1, 2023

0.1.3

Jul 21, 2023

0.1.2

Jul 20, 2023

0.1.1

Jul 18, 2023

0.1.0

Jul 17, 2023

0.0.9

Jul 15, 2023

0.0.1

Jul 14, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmtuner-0.2.1.tar.gz (75.5 kB view details)

Uploaded Nov 9, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmtuner-0.2.1-py3-none-any.whl (92.9 kB view details)

Uploaded Nov 9, 2023 Python 3

File details

Details for the file llmtuner-0.2.1.tar.gz.

File metadata

Download URL: llmtuner-0.2.1.tar.gz
Upload date: Nov 9, 2023
Size: 75.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for llmtuner-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`109f0af21bce9003a5280ea2daab4384971cd46ac5de0d71fac1175618050da4`
MD5	`e8b36f1223a5ab0091080bb8685be7f8`
BLAKE2b-256	`bc79f332fd3d6e7171146c98533539eef847ec139832b93fc663dc9bbe851d5d`

See more details on using hashes here.

File details

Details for the file llmtuner-0.2.1-py3-none-any.whl.

File metadata

Download URL: llmtuner-0.2.1-py3-none-any.whl
Upload date: Nov 9, 2023
Size: 92.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for llmtuner-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca8157a522657376b5fa9f63c70ce3d8f8b9891ef93efb825561bad6d8739613`
MD5	`479e3815589379ddd4b0885ab30dc295`
BLAKE2b-256	`f5fceb02f6bb49a7e563ca76539472778723b27cc2b826b022a0ee2b174fbf90`

See more details on using hashes here.

llmtuner 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLaMA Factory: Training and Evaluating Large Language Models with Minimal Effort

LLaMA Board: A One-stop Web UI for Getting Started with LLaMA Factory

Changelog

Supported Models

Supported Training Approaches

Provided Datasets

Requirement

Getting Started

Data Preparation (optional)

Dependence Installation (optional)

Train on a single GPU

Pre-Training

Supervised Fine-Tuning

Reward Modeling

PPO Training

DPO Training

Distributed Training

Use Huggingface Accelerate

Use DeepSpeed

Export model

API Demo

CLI Demo

Web Demo

Evaluation

Predict

Projects using LLaMA Factory

License

Citation

Acknowledgement

Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes