No project description provided
Project description
LLM-ATC (Air Traffic Controller) is a CLI for fine tuning and serving open source models using your own cloud credentials. We hope that this project can lower the cognitive overhead of orchestration for fine tuning and model serving.
Installation
Follow the instructions here to install Skypilot and provide cloud credentials. We use Skypilot for cloud orchestration. Steps to setup an environment is shown below.
# create a fresh environment
conda create -n "llm-atc" python=3.10
conda activate sky
# For Macs, macOS >= 10.15 is required to install SkyPilot. For Apple Silicon-based devices (e.g. Apple M1)
pip uninstall grpcio; conda install -c conda-forge grpcio=1.43.0
# install the skypilot cli and dependency, for the clouds you want, e.g. GCP
pip install skypilot[gcp] # for aws, skypilot[aws]
# Configure your cloud credentials. This is a GCP example. See https://skypilot.readthedocs.io/en/latest/getting-started/ installation.html for examples with other cloud providers.
pip install google-api-python-client
conda install -c conda-forge google-cloud-sdk
gcloud init
gcloud auth application-default login
From PyPi
pip install llm-atc
From source
python -m pip install skypilot
poetry install
Finetuning
Supported fine-tune methods.
- Vicuna (chat-finetuning)
To start finetuning a model. Use llm-atc train
. For example
llm-atc train --model_type vicuna --finetune_data ./vicuna_test.json --name myvicuna --description "This is a finetuned model that just says its name is vicuna" -c mycluster --cloud gcp --envs "MODEL_SIZE=7 WANDB_API_KEY=<my wandb key>" --accelerator A100-80G:4
If your client disconnects from the train, the train run will continue. You can check it's status with sky queue mycluster
When training completes, by default, your model, will be saved to an object store corresponding to the cloud provider which launched the training instance. For instance,
# s3 location
s3://llm-atc/myvicuna
# gcp location
g3://llm-atc/myvicuna
Serving
llm-atc
can serve both models from HuggingFace or that you've trained through llm-atc serve
. For example
# serve an llm-atc finetuned model, requires `llm-atc/` prefix and grabs model checkpoint from object store
llm-atc serve --name llm-atc/myvicuna --accelerator A100:1 -c serveCluster --cloud gcp --region asia-southeast1
# serve a HuggingFace model, e.g. `lmsys/vicuna-13b-v1.3`
llm-atc serve --name lmsys/vicuna-13b-v1.3 --accelerator A100:1 -c serveCluster --cloud gcp --region asia-southeast1
This creates a OpenAI API server on port 8000 on the cluster head and one model worker. Forward this port to your laptop with
# Forward port 8000 to your localhost
ssh -N -L 8000:localhost:8000 serveCluster
# test which models are available
curl http://localhost:8000/v1/models
and you can connect to this server and develop your using your finetuned models with your favorite LLM frameworks like LangChain. An example of how integrate Langchain (through Fastchat) is linked here. Example with ATC coming soon
How does it work?
Training, serving, and orchestration are powered by SkyPilot, FastChat, and vLLM. We've made this decision since we believe this will allow people to train and deploy custom LLMs without cloud-lockin.
We currently rely on default hyperparameters from other training code repositories, but we will add options to overwrite these so that users have more control over training, but for now, we think the defaults should suffice for most use cases.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.