Updated and improved implementation of the self-instruct system.
Project description
airoboros: using large language models to fine-tune large language models
This is my take on implementing the Self-Instruct paper. The approach is quite heavily modified, and does not use any human-generated seeds.
This updated implementation supports either the /v1/completions endpoint or /v1/chat/completions, which is particularly useful in that it supports gpt-4 and gpt-3.5-turbo (which is 1/10 the cost of text-davinci-003).
Huge thank you to the folks over at a16z for sponsoring the costs associated with building models and associated tools!
Install
via pip:
pip install --no-build-isolation airoboros
from source (keeping the source):
git clone https://github.com/jondurbin/airoboros
pip install -e --no-build-isolation ./airoboros
Key differences from self-instruct/alpaca
- support for either /v1/completions or /v1/chat/completions APIs (which allows gpt-3.5-turbo instead of text-davinci-003, as well as gpt-4 if you have access)
- support for custom topics list, custom topic generation prompt, or completely random topics
- in-memory vector db (Chroma) for similarity comparison, which is much faster than calculating rouge score for each generated instruction
- (seemingly) better prompts, which includes injection of random topics to relate the instructions to, which creates much more diverse synthetic instructions
- asyncio producers with configurable batch size
- several "instructors", each targetting specific use-cases, such as Orca style reasoning/math, role playing, etc.
- tries to ensure the context, if provided, is relevant to the topic and contains all the information that would be necessary to respond to the instruction, and nost just a link to article/etc.
- generally speaking, this implementation tries to reduce some of the noise
Goal of this project
Problem and proposed solution:
- Models can only ever be as good as the data they are trained on.
- High quality data is difficult to curate manually, so ideally the process can be automated by AI/LLMs.
- Large models (gpt-4, etc.) are pricey to build/run and out of reach for individuals/small-medium business, and are subject to RLHF bias, censorship, and changes without notice.
- Smaller models (llama-2-70b, etc.) can reach somewhat comparable performance in specific tasks to much larger models when trained on high quality data.
- The airoboros tool allows building datasets that are focused on specific tasks, which can then be used to build a plethora of individual expert models. This means we can crowdsource building experts.
- Using either a classifier model, or simply calculating vector embeddings for each item in the dataset and using faiss index/cosine similarity/etc. search, incoming requests can be routed to a particular expert (e.g. dynamically loading LoRAs) to get extremely high quality responses.
Progress:
- ✅ PoC that training via self-instruction, that is, datasets generated from language models, works reasonably well.
- ✅ Iterate on the PoC to use higher quality prompts, more variety of instructions, etc.
- ✅ Split the code into separate "instructors", for specializing in any particular task (creative writing, songs, roleplay, coding, execution planning, function calling, etc.)
- [in progress]: PoC that an ensemble of LoRAs split by the category (i.e., the instructor used in airoboros) has better performance than the same param count model tuned on all data
- [in progress]: Remove the dependency on OpenAI/gpt-4 to generate the training data so all datasets can be completely free and open source.
- [future]: Automatic splitting of experts at some threshold, e.g. "coding" is split into python, js, golang, etc.
- [future]: Hosted service/site to build and/or extend datasets or models using airoboros.
- [future]: Depending on success of all of the above, potentially a hosted inference option with an exchange for private/paid LoRAs.
LMoE
LMoE is the simplest architecture I can think of for a mixture of experts. It doesn't use a switch transformer, doesn't require slicing and merging layers with additional fine-tuning, etc. It just dynamically loads the best PEFT/LoRA adapter model based on the incoming request.
By using this method, we can theoretically crowdsource generation of dozens (or hundreds/thousands?) of very task-specific adapters and have an extremely powerful ensemble of models with very limited resources on top of a single base model (llama-2 7b/13b/70b).
Tuning the experts
The self-instruct code contained within this project uses many different "instructors" to generate training data to accomplish specific tasks. The output includes the instructor/category that generated the data. We can use this to automatically segment the training data to fine-tune specific "experts".
See scripts/segment_experts.py
for an example of how the training data can be segmented, with a sampling of each other expert in the event of misrouting.
See scripts/tune_expert.py
for an example of creating the adapter models (with positional args for expert name, model size, etc.)
NOTE: this assumes use of my fork of qlora https://github.com/jondurbin/qlora
Routing requests to the expert
The "best" routing mechanism would probably be to train a classifier based on the instructions for each category, with the category/expert being the label, but that prohibits dynamic loading of new experts.
Instead, this supports 3 options:
- faiss index similarity search using the training data for each expert (default)
- agent-based router using the "function" expert (query the LLM with a list of available experts and their descriptions, ask which would be best based on the user's input)
- specify the agent in the JSON request
Running the API server
First, download the base llama-2 model for whichever model size you want, e.g.: llama-2-7b-hf
Next, download the LMoE package that corresponds to that base model, e.g.: airoboros-lmoe-7b-2.1
NOTE: 13b also available, 70b in progress
Here's an example command to start the server:
python -m airoboros.lmoe.api \
--base-model ./llama-2-7b-hf \
--lmoe ./airoboros-lmoe-7b-2.1 \
--router-max-samples 1000 \
--router-k 25 \
--port 8000 \
--host 127.0.0.1
to use the agent-based router, add --agent-router
to the arguments
This uses flash attention via bettertransformers (in optimum). You may need to install torch nightly if you see an error like 'no kernel available', e.g.:
pip install -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
Once started, you can infer using the same API scheme you'd query OpenAI API with, e.g.:
curl -H 'content-type: application/json' http://127.0.0.1:8000/v1/chat/completions -d '
{
"model": "llama-2-7b-hf",
"temperature": 0.7,
"max_tokens": 2048,
"messages": [
{
"role": "system",
"content": "A chat."
},
{
"role": "user",
"content": "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"
}
]
}'
I've also added an vllm-based server, but the results aren't quite as good (not sure why yet). To use it, make sure you install vllm
and fschat
, or pip install airoboros[vllm]
python -m airoboros.lmoe.vllm \
--model ./llama-2-7b-hf \
--lmoe-path ../airoboros-lmoe-7b-2.1 \
--router-max-samples 100 \
--router-k 25 \
--port 8000 \
--host 127.0.0.1
Generating instructions
NEW - 2023-07-18
To better accommodate the plethora of options, the configuration has been moved to a YAML config file.
Please create a copy of example-config.yaml
and configure as desired.
Once you have the desired configuration, run:
airoboros generate-instructions --config-path /path/to/config.yaml
Generating topics
NEW - 2023-07-18
Again, this is now all YAML configuration based! Please create a customized version of the YAML config file, then run:
airoboros generate-topics --config-path /path/to/config.yaml
You can override the topic_prompt
string in the configuration to use a different topic generation prompt.
Support the work
ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
Models (research use only):
gpt-4 versions
llama-2 base model
2.1 dataset
2.0/m2.0
- airoboros-l2-7b-gpt4-2.0
- airoboros-l2-7b-gpt4-m2.0
- airoboros-l2-13b-gpt4-2.0
- airoboros-l2-13b-gpt4-m2.0
Previous generation (1.4.1 dataset)
original llama base model
Latest version (2.0 / m2.0 datasets)
Previous generation (1.4.1 dataset)
- airoboros-65b-gpt4-1.4
- airoboros-33b-gpt4-1.4
- airoboros-13b-gpt4-1.4
- airoboros-7b-gpt4-1.4
- older versions on HF as well
mpt-30b base model
gpt-3.5-turbo versions
Datasets
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file airoboros-2.2.2.tar.gz
.
File metadata
- Download URL: airoboros-2.2.2.tar.gz
- Upload date:
- Size: 87.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 169b74498724e7950d5f2e0c3789c64853c51e4209e513d69bbe354e22c70480 |
|
MD5 | 89cb8190157695d4ea4916c9936865ab |
|
BLAKE2b-256 | 8fb87940c85d8260206a2627ac62ae932f2161caffd200417241f8400e0e2db2 |
File details
Details for the file airoboros-2.2.2-py3-none-any.whl
.
File metadata
- Download URL: airoboros-2.2.2-py3-none-any.whl
- Upload date:
- Size: 116.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b72476aa86355bcd0cf986d3966c7980c61aa723d36ab2057a347cc81372fa7 |
|
MD5 | de6d7927101e5ce267d95cf2ca95f07e |
|
BLAKE2b-256 | 7ab01ae636a273eaaae926cbef589cb8e284cb58ab615ce6786270675f0ec897 |