Effiently serve LoRA tuned models
Project description
Frequency
Efficiently serve LoRA tuned models.
Frequency provides a means to hot-swap LoRA layers in ML models at the time of inference allowing for the efficient usage of large base models.
Install
pip install frequency-ai
Install server component on Kubernetes
helm install frequency oci://artifact.frequency.ai/frequency-server:0.0.1
Usage
Load a HuggingFace model and use adapters
from transformers import AutoModelForCausalLM, AutoTokenizer
from frequency import Client
# Connect to the frequency server
client = Client("localhost:9000")
# Load an hf model onto the server
model = client.load_model(name="qwen-vl-chat", hf_repo="Qwen/Qwen-VL-Chat", type=AutoModelForCausalLM)
# Cache an adapter on the server that was trained on dog images
resp = model.cache_adapter(name="dog", hf_repo="Anima-ai/dog_lora")
# Qwen expects a specific format for describing images
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
query = tokenizer.from_list_format([
{'image': 'https://hips.hearstapps.com/ghk.h-cdn.co/assets/17/30/pembroke-welsh-corgi.jpg'},
{'text': 'What is this?'},
])
# Chat with the model using the dog adapter
response, history = model.chat(query=query, adapters=["dog"])
#> Here is a picture of a Corgi
# Cache an adapter on the server that was trained on cat images
resp = model.cache_adapter(name="cat", hf_repo="Anima-ai/cat_lora")
print(resp)
query = tokenizer.from_list_format([
{'image': 'https://www.catster.com/wp-content/uploads/2023/11/Brown-tabby-cat-that-curls-up-outdoors_viper-zero_Shutterstock-800x533.jpg'},
{'text': 'What is this?'},
])
# Chat with the same model using the new cat adapter
response, history = model.chat(query=query, adapters=["cat"])
#> Here is a picture of a tabby cat
Roadmap
- Tenancy
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
frequency_ai-0.1.5.tar.gz
(42.8 kB
view hashes)
Built Distribution
Close
Hashes for frequency_ai-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fded537e912faab3bcc0aa3dc0cd4f46bf87c59a365d693e6696252d0545f1b8 |
|
MD5 | 3506a97b23801097b20f042a1c282aad |
|
BLAKE2b-256 | a168d8fe4c19e13d41bf4201e82e062604482f027674a4621bb951a8dbffd830 |