Effiently serve LoRA tuned models
Project description
Frequency
Efficiently serve LoRA tuned models.
Frequency provides a means to hot-swap LoRA layers in ML models at the time of inference allowing for the efficient usage of large base models.
Install
pip install frequency-ai
Install server component on Kubernetes
helm install frequency oci://artifact.frequency.ai/frequency-server:0.0.1
Usage
Load a HuggingFace model and use adapters
from transformers import AutoModelForCausalLM, AutoTokenizer
from frequency import Client
# Connect to the frequency server
client = Client("localhost:9000")
# Load an hf model onto the server
model = client.load_model(name="qwen-vl-chat", hf_repo="Qwen/Qwen-VL-Chat", type=AutoModelForCausalLM)
# Cache an adapter on the server that was trained on dog images
resp = model.cache_adapter(name="dog", hf_repo="Anima-ai/dog_lora")
# Qwen expects a specific format for describing images
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
query = tokenizer.from_list_format([
{'image': 'https://hips.hearstapps.com/ghk.h-cdn.co/assets/17/30/pembroke-welsh-corgi.jpg'},
{'text': 'What is this?'},
])
# Chat with the model using the dog adapter
response, history = model.chat(query=query, adapters=["dog"])
#> Here is a picture of a Corgi
# Cache an adapter on the server that was trained on cat images
resp = model.cache_adapter(name="cat", hf_repo="Anima-ai/cat_lora")
print(resp)
query = tokenizer.from_list_format([
{'image': 'https://www.catster.com/wp-content/uploads/2023/11/Brown-tabby-cat-that-curls-up-outdoors_viper-zero_Shutterstock-800x533.jpg'},
{'text': 'What is this?'},
])
# Chat with the same model using the new cat adapter
response, history = model.chat(query=query, adapters=["cat"])
#> Here is a picture of a tabby cat
Roadmap
- Tenancy
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
frequency_ai-0.1.6.tar.gz
(42.8 kB
view hashes)
Built Distribution
Close
Hashes for frequency_ai-0.1.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b50e19ba0010a682e0e481d797b340f6466c17126cf68b270ab0ecb829b8bd71 |
|
MD5 | fae2189d3c0e83720ff5de7175d7351f |
|
BLAKE2b-256 | 6eba2bec9e61985440652186bc819cd8bed3e42415bc998f6a30fa035d84c68f |