byzerllm

ByzerLLM: Byzer LLM

These details have not been verified by PyPI

Project description

Byzer-LLM

Easy, fast, and cheap pretrain,finetune, serving for everyone

Latest News 🔥

[2023/12] Release Byzer-LLM 0.1.20
[2023/11] Release Byzer-LLM 0.1.16

Byzer-LLM is Ray based , a full lifecycle solution for LLM that includes pretrain, fintune, deployment and serving.

The unique features of Byzer-LLM are:

Full lifecyle: pretrain and finetune,deploy and serving support
Python/SQL API support
Ray based, easy to scale

Versions
Installation
Quick Start
Quatization
Supported Models
vLLM Support
DeepSpeed Support
Function Calling
Respond with pydantic class
LLM-Friendly Function/DataClass
SQL Support
SaaS Models
Pretrain
Finetune
Stream Chat
Contributing

Versions

0.1.20： Function Calling support/ Response with pydantic class
0.1.19： Fix embedding bugs
0.1.18： Support stream chat/ Support Model Template
0.1.17： None
0.1.16： Enhance the API for byzer-retrieval
0.1.14： add get_tables/get_databases API for byzer-retrieval
0.1.13: support shutdown cluster for byzer-retrieval
0.1.12: Support Python API (alpha)
0.1.5: Support python wrapper for byzer-retrieval

Installation

pip install -r requirements.txt
pip install -U vllm
pip install -U byzerllm
ray start --head

Quick Start

import ray
from byzerllm.utils.client import ByzerLLM,LLMRequest,InferBackend

ray.init(address="auto",namespace="default",ignore_reinit_error=True)

llm = ByzerLLM()

llm.setup_gpus_per_worker(4).setup_num_workers(1)
llm.setup_infer_backend(InferBackend.transformers)

llm.deploy(model_path="/home/byzerllm/models/openbuddy-llama2-13b64k-v15",
           pretrained_model_type="custom/llama2",
           udf_name="llama2_chat",infer_params={})

llm.chat("llama2_chat",LLMRequest(instruction="hello world"))[0].output

The above code will deploy a llama2 model and then use the model to infer the input text. If you use transformers as the inference backend, you should specify the pretrained_model_type manually since the transformers backend can not auto detect the model type.

Byzer-LLM also support deploy SaaS model with the same way. This feature provide a unified interface for both open-source model and SaaS model. The following code will deploy a Azure OpenAI model and then use the model to infer the input text.

import ray
from byzerllm.utils.client import ByzerLLM,LLMRequest,InferBackend
ray.init(address="auto",namespace="default",ignore_reinit_error=True)

llm = ByzerLLM()

llm.setup_gpus_per_worker(0).setup_num_workers(10)
llm.setup_infer_backend(InferBackend.transformers)

llm.deploy(pretrained_model_type="saas/azure_openai",
           udf_name="azure_openai",
           infer_params={
            "saas.api_type":"azure",
            "saas.api_key"="xxx"
            "saas.api_base"="xxx"
            "saas.api_version"="2023-07-01-preview"
            "saas.deployment_id"="xxxxxx"
           })

llm.chat("azure_openai",LLMRequest(instruction="hello world"))[0].output

Notice that the SaaS model does not need GPU, so we set the setup_gpus_per_worker to 0, and you can use setup_num_workers to control max concurrency,how ever, the SaaS model has its own max concurrency limit, the setup_num_workers only control the max concurrency accepted by the Byzer-LLM.

Quatization

For now, only the InferBackend.transformers backend support Quatization configuration. Here is the baichuan2 example:

llm.setup_gpus_per_worker(2).setup_num_workers(1).setup_infer_backend(InferBackend.Transformers)
llm.deploy(
    model_path=model_location,
    pretrained_model_type="custom/baichuan2",
    udf_name="baichuan2_13_chat",
    infer_params={"quatization":"4"}
)

The available quatization values:

4
8
true/false

When it's set true, the int4 will be choosed.

Supported Models

The supported open-source pretrained_model_type are:

custom/llama2
bark
whisper
chatglm6b
custom/chatglm2
moss
custom/alpha_moss
dolly
falcon
llama
custom/starcode
custom/visualglm
custom/m3e
custom/baichuan
custom/bge
custom/qwen_vl_chat
custom/stable_diffusion
custom/zephyr

The supported SaaS pretrained_model_type are:

saas/chatglm Chatglm130B
saas/sparkdesk 星火大模型
saas/baichuan 百川大模型
saas/zhipu 智谱大模型
saas/minimax MiniMax 大模型
saas/qianfan 文心一言
saas/azure_openai
saas/openai

Notice that the derived models from llama/llama2/startcode are also supported. For example, you can use llama to load vicuna model.

vLLM Support

The Byzer-llm also support vLLM as the inference backend. The following code will deploy a vLLM model and then use the model to infer the input text.

import ray
from byzerllm.utils.retrieval import ByzerRetrieval
from byzerllm.utils.client import ByzerLLM,LLMRequest,InferBackend

ray.init(address="auto",namespace="default",ignore_reinit_error=True)
llm = ByzerLLM()

llm.setup_gpus_per_worker(2)
llm.setup_num_workers(1)
llm.setup_infer_backend(InferBackend.VLLM)

llm.deploy(
    model_path="/home/byzerllm/models/openbuddy-zephyr-7b-v14.1",
    pretrained_model_type="custom/auto",
    udf_name="zephyr_chat"",
    infer_params={"backend.max_num_batched_tokens":32768}
)

llm.chat("zephyr_chat",LLMRequest(instruction="hello world"))[0].output

There are some tiny differences between the vLLM and the transformers backend.

The pretrained_model_type is fixed to custom/auto for vLLM, since the vLLM will auto detect the model type.
Use setup_infer_backend to specify InferBackend.VLLM as the inference backend.

Stream Chat

If the model deployed with the backend vLLM, then it also support stream chat： the stream_chat_oai will return a generator, you can use the generator to get the output text.

t = llm.stream_chat_oai(conversations=[{
    "role":"user",
    "content":"Hello, how are you?"
}])

for line in t:
   print(line+"\n")

DeepSpeed Support

The Byzer-llm also support DeepSpeed as the inference backend. The following code will deploy a DeepSpeed model and then use the model to infer the input text.

import ray
from byzerllm.utils.retrieval import ByzerRetrieval
from byzerllm.utils.client import ByzerLLM,LLMRequest,InferBackend

ray.init(address="auto",namespace="default",ignore_reinit_error=True)
llm = ByzerLLM()

llm.setup_gpus_per_worker(4)
llm.setup_num_workers(1)
llm.setup_infer_backend(InferBackend.DeepSpeed)

llm.deploy(
    model_path="/home/byzerllm/models/openbuddy-llama-13b-v5-fp16",
    pretrained_model_type="custom/auto",
    udf_name="llama_chat"",
    infer_params={}
)

llm.chat("llama_chat",LLMRequest(instruction="hello world"))[0].output

The code above is totally the same as the code for vLLM, except that the InferBackend is InferBackend.DeepSpeed.

Function Calling

Here is a simple example for function calling based on QWen 72B

Deploy Model:

import ray
ray.init(address="auto",namespace="default") 
llm = ByzerLLM()

model_location="/home/byzerllm/models/Qwen-72B-Chat"
max_model_len = 24000

llm.setup_gpus_per_worker(8).setup_num_workers(1).setup_infer_backend(InferBackend.VLLM)
llm.deploy(
    model_path=model_location,
    pretrained_model_type="custom/auto",
    udf_name=chat_model_name,
    infer_params={"backend.max_num_batched_tokens":24000,
                  "backend.max_model_len":max_model_len}
)

llm.setup_default_model_name("chat")
llm.setup_max_model_length("chat",max_model_len)
llm.setup_template("chat",Templates.qwen())

Try to create some Python functions:

from typing import List,Dict,Any,Annotated
import pydantic 
import datetime
from dateutil.relativedelta import relativedelta

def compute_date_range(count:Annotated[int,"时间跨度，数值类型"],
                       unit:Annotated[str,"时间单位，字符串类型",{"enum":["day","week","month","year"]}])->List[str]:
    '''
    计算日期范围

    Args:
        count: 时间跨度，数值类型
        unit: 时间单位，字符串类型，可选值为 day,week,month,year
    '''        
    now = datetime.datetime.now()
    now_str = now.strftime("%Y-%m-%d %H:%M:%S")
    if unit == "day":
        return [(now - relativedelta(days=count)).strftime("%Y-%m-%d %H:%M:%S"),now_str]
    elif unit == "week":
        return [(now - relativedelta(weeks=count)).strftime("%Y-%m-%d %H:%M:%S"),now_str]
    elif unit == "month":
        return [(now - relativedelta(months=count)).strftime("%Y-%m-%d %H:%M:%S"),now_str]
    elif unit == "year":
        return [(now - relativedelta(years=count)).strftime("%Y-%m-%d %H:%M:%S"),now_str]
    return ["",""]

def compute_now()->str:
    '''
    计算当前时间
    '''
    return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

Here we provide two functions:

compute_date_range: compute the date range based on the count and unit
compute_now: get the current date

We will use the model to call these tools according to the user's question.

t = llm.chat_oai([{
    "content":'''计算当前时间''',
    "role":"user"    
}],tools=[compute_date_range,compute_now],execute_tool=True)

t[0].values

## output: ['2023-12-18 17:30:49']

t = llm.chat_oai([{
    "content":'''最近三个月趋势''',
    "role":"user"    
}],tools=[compute_date_range,compute_now],execute_tool=True)

t[0].values

## output: [['2023-09-18 17:31:21', '2023-12-18 17:31:21']]

t = llm.chat_oai([{
    "content":'''最近三天''',
    "role":"user"    
}],tools=[compute_date_range,compute_now],execute_tool=True)

t[0].values

## output: [['2023-12-15 17:23:38', '2023-12-18 17:23:38']]

t = llm.chat_oai([{
    "content":'''你吃饭了么？''',
    "role":"user"    
}],tools=[compute_date_range,compute_now],execute_tool=True)

if t[0].values:
    print(t[0].values[0])
else:
    print(t[0].response.output)   

## output: '您好，我是一个人工智能语言模型，暂时无法吃饭。'

Respond with pydantic class

When you chat with LLM, you can specify the reponse class,

import pydantic 

class Story(pydantic.BaseModel):
    '''
    故事
    '''

    title: str = pydantic.Field(description="故事的标题")
    body: str = pydantic.Field(description="故事主体")



t = llm.chat_oai([
{
    "content":f'''请给我讲个故事，分成两个部分，一个标题，一个故事主体''',
    "role":"user"
},
],response_class=Story)

t[0].value

## output: Story(title='勇敢的小兔子', body='在一个美丽的森林里，住着一只可爱的小兔子。小兔子非常勇敢，有一天，森林里的动物们都被大灰狼吓坏了。只有小兔子站出来，用智慧和勇气打败了大灰狼，保护了所有的动物。从此，小兔子成为了森林里的英雄。')

The above code will ask the LLM to generate the Story class directly. However, sometimes we hope the LLM generate text first, then extract the structure from the text, you can set response_after_chat=True to enable this behavior. However, this will bring some performance penalty(additional inference).

t = llm.chat_oai([
{
    "content":f'''请给我讲个故事，分成两个部分，一个标题，一个故事主体''',
    "role":"user"
},
],response_class=Story,response_after_chat=True)

t[0].value
## output: Story(title='月光下的守护者', body='在一个遥远的古老村庄里，住着一位名叫阿明的年轻人。阿明是个孤儿，从小在村里长大，以种田为生。他善良、勤劳，深受村民们喜爱。\n\n村子里有个传说，每当满月时分，月亮女神会在村子后山的古树下出现，赐福给那些善良的人们。然而，只有最纯洁的心才能看到她。因此，每年的这个时候，阿明都会独自一人前往后山，希望能得到女神的祝福。\n\n这一年，村子遭受了严重的旱灾，庄稼枯黄，人们生活困苦。阿明决定向月亮女神祈求降雨，拯救村子。他在月光下虔诚地祈祷，希望女神能听到他的呼唤。\n\n就在这个时刻，月亮女神出现了。她被阿明的善良和执着所感动，答应了他的请求。第二天早晨，天空乌云密布，大雨倾盆而下，久旱的土地得到了滋润，庄稼重新焕发生机。\n\n从此以后，每年的满月之夜，阿明都会去后山等待月亮女神的出现，他成为了村民心中的守护者，用他的善良和执着，守护着整个村庄。而他也终于明白，真正的守护者，并非需要超凡的力量，只需要一颗充满爱与善良的心。')

LLM-Friendly Function/DataClass

If you want to improve the performance of Function Calling or Response Class, you should make your Function(Tool) and Data Class is LLM-Friendly.

Let's take a look at the following python code:

def compute_date_range(count:int, unit:str)->List[str]:                   
    now = datetime.datetime.now()
    ....

This code is not LLM-Friendly Function since it's difficult to know the usage of this funciton and what's the meaning of the input parameters.

The LLM just like human, it's hard to let the LLM know when or how to invoke this function. Especially the parameter unit actually is enum value but the LLM no way to get this message.

So, in order to make the LLM knows more about this function in Byzer-LLM, you should follow some requirments:

Adding pythonic function comment
Use annotated to provide type and comment for every parameter, if the parameter is a enum, then provide enum values.

Here is the LLM-Friendly fuction definision.

def compute_date_range(count:Annotated[int,"时间跨度，数值类型"],
                       unit:Annotated[str,"时间单位，字符串类型",{"enum":["day","week","month","year"]}])->List[str]:
    '''
    计算日期范围

    Args:
        count: 时间跨度，数值类型
        unit: 时间单位，字符串类型，可选值为 day,week,month,year
    '''        
    now = datetime.datetime.now()
    ....

If the LLM make something wrong to your function (e.g. provide the bad parameters), try to optimize the function comment and the parameter Annotated comment.

SQL Support

In addition to the Python API, Byzer-llm also support SQL API. In order to use the SQL API, you should install Byzer-SQL language first.

Try to install the Byzer-SQL language with the following command:

git clone https://gitee.com/allwefantasy/byzer-llm
cd byzer-llm/setup-machine
sudo -i 
ROLE=master ./setup-machine.sh

After the installation, you can visit the Byzer Console at http://localhost:9002.

In the Byzer Console, you can run the following SQL to deploy a llama2 model which have the same effect as the Python code above.

!byzerllm setup single;
!byzerllm setup "num_gpus=4";
!byzerllm setup "maxConcurrency=1";
!byzerllm setup "infer_backend=transformers";

run command as LLM.`` where 
action="infer"
and pretrainedModelType="custom/llama2"
and localModelDir="/home/byzerllm/models/openbuddy-llama-13b-v5-fp16"
and reconnect="false"
and udfName="llama2_chat"
and modelTable="command";

Then you can invoke the model with UDF llama2_chat:

select 
llama2_chat(llm_param(map(
              "user_role","User",
              "assistant_role","Assistant",
              "system_msg",'You are a helpful assistant. Think it over and answer the user question correctly.',
              "instruction",llm_prompt('
Please remenber my name: {0}              
',array("Zhu William"))

))) as q 
as q1;

Once you deploy the model with run command as LLM, then you can ues the model as a SQL function. This feature is very useful for data scientists who want to use LLM in their data analysis or data engineers who want to use LLM in their data pipeline.

QWen

If you use QWen in ByzerLLM, you should sepcify the following parameters mannualy:

the role mapping
the stop_token_ids
trim the stop tokens from the output

However, we provide a template for this, try to the following code:

from byzerllm.utils.client import Templates

### Here,we setup the template for qwen
llm.setup_template("chat",Templates.qwen())

t = llm.chat_oai(conversations=[{
    "role":"user",
    "content":"你好,给我讲个100字的笑话吧?"
}])
print(t)

SaaS Models

Since the different SaaS models have different parameters, here we provide some templates for the SaaS models to help you deploy the SaaS models.

qianfan

!byzerllm setup single;
!byzerllm setup "num_gpus=0";
!byzerllm setup "maxConcurrency=10";

run command as LLM.`` where
action="infer"
and pretrainedModelType="saas/qianfan"
and `saas.api_key`="xxxxxxxxxxxxxxxxxx"
and `saas.secret_key`="xxxxxxxxxxxxxxxx"
and `saas.model`="ERNIE-Bot-turbo"
and `saas.retry_count`="3"
and `saas.request_timeout`="120"
and reconnect="false"
and udfName="qianfan_saas"
and modelTable="command";

azure openai

!byzerllm setup single;
!byzerllm setup "num_gpus=0";
!byzerllm setup "maxConcurrency=10";

run command as LLM.`` where
action="infer"
and pretrainedModelType="saas/azure_openai"
and `saas.api_type`="azure"
and `saas.api_key`="xxx"
and `saas.api_base`="xxx"
and `saas.api_version`="2023-07-01-preview"
and `saas.deployment_id`="xxxxx"
and udfName="azure_openai"
and modelTable="command";

openai

!byzerllm setup single;
!byzerllm setup "num_gpus=0";
!byzerllm setup "maxConcurrency=10";

run command as LLM.`` where
action="infer"
and pretrainedModelType="saas/azure_openai"
and `saas.api_type`="azure"
and `saas.api_key`="xxx"
and `saas.api_base`="xxx"
and `saas.api_version`="xxxxx"
and `saas.model`="xxxxx"
and udfName="openai_saas"
and modelTable="command";

zhipu

!byzerllm setup single;
!byzerllm setup "num_gpus=0";
!byzerllm setup "maxConcurrency=10";

run command as LLM.`` where
action="infer"
and pretrainedModelType="saas/zhipu"
and `saas.api_key`="xxxxxxxxxxxxxxxxxx"
and `saas.secret_key`="xxxxxxxxxxxxxxxx"
and `saas.model`="chatglm_lite"
and udfName="zhipu_saas"
and modelTable="command";

minimax

!byzerllm setup single;
!byzerllm setup "num_gpus=0";
!byzerllm setup "maxConcurrency=10";

run command as LLM.`` where
action="infer"
and pretrainedModelType="saas/minimax"
and `saas.api_key`="xxxxxxxxxxxxxxxxxx"
and `saas.group_id`="xxxxxxxxxxxxxxxx"
and `saas.model`="abab5.5-chat"
and `saas.api_url`="https://api.minimax.chat/v1/text/chatcompletion_pro"
and udfName="minimax_saas"
and modelTable="command";

sparkdesk

!byzerllm setup single;
!byzerllm setup "num_gpus=0";
!byzerllm setup "maxConcurrency=10";

run command as LLM.`` where
action="infer"
and pretrainedModelType="saas/sparkdesk"
and `saas.appid`="xxxxxxxxxxxxxxxxxx"
and `saas.api_key`="xxxxxxxxxxxxxxxx"
and `saas.api_secret`="xxxx"
and `gpt_url`="ws://spark-api.xf-yun.com/v1.1/chat"
and udfName="sparkdesk_saas"
and modelTable="command";

baichuan

!byzerllm setup single;
!byzerllm setup "num_gpus=0";
!byzerllm setup "maxConcurrency=10";

run command as LLM.`` where
action="infer"
and pretrainedModelType="saas/baichuan"
and `saas.api_key`="xxxxxxxxxxxxxxxxxx"
and `saas.secret_key`="xxxxxxxxxxxxxxxx"
and `saas.baichuan_api_url`="https://api.baichuan-ai.com/v1/chat"
and `saas.model`="Baichuan2-53B"
and udfName="baichuan_saas"
and modelTable="command";

Pretrain

This section will introduce how to pretrain a LLM model with Byzer-llm. However, for now, the pretrain feature is more mature in Byzer-SQL, so we will introduce the pretrain feature in Byzer-SQL.

-- Deepspeed Config
set ds_config='''
{
  "gradient_accumulation_steps": 1,
  "train_micro_batch_size_per_gpu": 1,
  "prescale_gradients": false,
  "zero_allow_untested_optimizer": true,
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": 1e-8,
      "eps": 1.0e-8,
      "betas": [
        0.9,
        0.95
      ],
      "weight_decay": 0.1
    }
  },
  "tensorboard": {
    "enabled": true
  },
  "zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
         "device": "cpu"         
     },           
    "offload_param": {
         "device": "cpu"
    },
    "contiguous_gradients": true,
    "allgather_bucket_size": 1e8,
    "reduce_bucket_size": 1e8,
    "overlap_comm": true,
    "reduce_scatter": true
  },
  "steps_per_print": 16,
  "gradient_clipping": 1.0,
  "wall_clock_breakdown": true,
  "bf16": {
    "enabled": true
  }
}
''';

-- load data
load text.`file:///home/byzerllm/data/raw_data/*`
where wholetext="true" as trainData;

select value as text,file from trainData  as newTrainData;

-- split the data into 12 partitions
run newTrainData as TableRepartition.`` where partitionNum="12" and partitionCols="file" 
as finalTrainData;


-- setup env, we use 12 gpus to pretrain the model
!byzerllm setup sfft;
!byzerllm setup "num_gpus=12";

-- specify the pretrain model type and the pretrained model path
run command as LLM.`` where 
and localPathPrefix="/home/byzerllm/models/sfft/jobs"
and pretrainedModelType="sfft/llama2"
-- original model is from
and localModelDir="/home/byzerllm/models/Llama-2-7b-chat-hf"
-- and localDataDir="/home/byzerllm/data/raw_data"

-- we use async mode to pretrain the model, since the pretrain process will take several days or weeks
-- Ray Dashboard will show the tensorboard address, and then you can monitor the loss
and detached="true"
and keepPartitionNum="true"

-- use deepspeed config, this is optional
and deepspeedConfig='''${ds_config}'''


-- the pretrain data is from finalTrainData table
and inputTable="finalTrainData"
and outputTable="llama2_cn"
and model="command"
-- some hyper parameters
and `sfft.int.max_length`="128"
and `sfft.bool.setup_nccl_socket_ifname_by_ip`="true"
;

Since the deepspeed checkpoint is not compatible with the huggingface checkpoint, we need to convert the deepspeed checkpoint to the huggingface checkpoint. The following code will convert the deepspeed checkpoint to the huggingface checkpoint.

!byzerllm setup single;

run command as LLM.`` where 
action="convert"
and pretrainedModelType="deepspeed/llama3b"
and modelNameOrPath="/home/byzerllm/models/base_model"
and checkpointDir="/home/byzerllm/data/checkpoints"
and tag="Epoch-1"
and savePath="/home/byzerllm/models/my_3b_test2";

Now you can deploy the converted model :

-- 部署hugginface 模型
!byzerllm setup single;

set node="master";
!byzerllm setup "num_gpus=2";
!byzerllm setup "workerMaxConcurrency=1";

run command as LLM.`` where 
action="infer"
and pretrainedModelType="custom/auto"
and localModelDir="/home/byzerllm/models/my_3b_test2"
and reconnect="false"
and udfName="my_3b_chat"
and modelTable="command";

Finetune

-- load data, we use the dummy data for finetune
-- data format supported by Byzer-SQL：https://docs.byzer.org/#/byzer-lang/zh-cn/byzer-llm/model-sft

load json.`/tmp/upload/dummy_data.jsonl` where
inferSchema="true"
as sft_data;

-- Fintune Llama2
!byzerllm setup sft;
!byzerllm setup "num_gpus=4";

run command as LLM.`` where 
and localPathPrefix="/home/byzerllm/models/sft/jobs"

-- 指定模型类型
and pretrainedModelType="sft/llama2"

-- 指定模型
and localModelDir="/home/byzerllm/models/Llama-2-7b-chat-hf"
and model="command"

-- 指定微调数据表
and inputTable="sft_data"

-- 输出新模型表
and outputTable="llama2_300"

-- 微调参数
and  detached="true"
and `sft.int.max_seq_length`="512";

You can check the finetune actor in the Ray Dashboard, the name of the actor is sft-william-xxxxx.

After the finetune actor is finished, you can get the model path, so you can deploy the finetuned model.

Here is the log of the finetune actor:

Loading data: /home/byzerllm/projects/sft/jobs/sft-william-20230809-13-04-48-674fd1b9-2fc1-45b9-9d75-7abf07cb84cb/finetune_data/data.jsonl3
2
there are 33 data in dataset
*** starting training ***
{'train_runtime': 19.0203, 'train_samples_per_second': 1.735, 'train_steps_per_second': 0.105, 'train_loss': 3.0778136253356934, 'epoch': 0.97}35

***** train metrics *****36  
epoch                    =       0.9737  
train_loss               =     3.077838  
train_runtime            = 0:00:19.0239  
train_samples_per_second =      1.73540  
train_steps_per_second   =      0.10541

[sft-william] Copy /home/byzerllm/models/Llama-2-7b-chat-hf to /home/byzerllm/projects/sft/jobs/sft-william-20230809-13-04-48-674fd1b9-2fc1-45b9-9d75-7abf07cb84cb/finetune_model/final/pretrained_model4243              
[sft-william] Train Actor is already finished. You can check the model in: /home/byzerllm/projects/sft/jobs/sft-william-20230809-13-04-48-674fd1b9-2fc1-45b9-9d75-7abf07cb84cb/finetune_model/final

You can download the finetuned model from the path /home/byzerllm/projects/sft/jobs/sft-william-20230809-13-04-48-674fd1b9-2fc1-45b9-9d75-7abf07cb84cb/finetune_model/final, or copy the model to all other node in the Ray cluster.

Try to deploy the finetuned model:

!byzerllm setup single;
run command as LLM.`` where 
action="infer"
and localPathPrefix="/home/byzerllm/models/infer/jobs"
and localModelDir="/home/byzerllm/models/sft/jobs/sft-william-llama2-alpaca-data-ccb8fb55-382c-49fb-af04-5cbb3966c4e6/finetune_model/final"
and pretrainedModelType="custom/llama2"
and udfName="fintune_llama2_chat"
and modelTable="command";

Byzer-LLM use QLora to finetune the model, you can merge the finetuned model with the original model with the following code:

-- 合并lora model + base model

!byzerllm setup single;

run command as LLM.`` where 
action="convert"
and pretrainedModelType="deepspeed/llama"
and model_dir="/home/byzerllm/models/sft/jobs/sft-william-20230912-21-50-10-2529bf9f-493e-40a3-b20f-0369bd01d75d/finetune_model/final/pretrained_model"
and checkpoint_dir="/home/byzerllm/models/sft/jobs/sft-william-20230912-21-50-10-2529bf9f-493e-40a3-b20f-0369bd01d75d/finetune_model/final"
and savePath="/home/byzerllm/models/sft/jobs/sft-william-20230912-21-50-10-2529bf9f-493e-40a3-b20f-0369bd01d75d/finetune_model/merge";

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.202

Jan 3, 2026

0.1.201

Jan 3, 2026

0.1.200

Dec 6, 2025

0.1.199

Dec 2, 2025

0.1.198

Nov 17, 2025

0.1.197

Sep 9, 2025

0.1.196

Aug 5, 2025

0.1.195

Aug 4, 2025

0.1.193

Jul 19, 2025

0.1.192

Jul 19, 2025

0.1.190

Jun 17, 2025

0.1.189

Jun 3, 2025

0.1.188

Jun 3, 2025

0.1.187

May 25, 2025

0.1.186

May 24, 2025

0.1.185

May 17, 2025

0.1.184

May 15, 2025

0.1.183

May 14, 2025

0.1.182

May 8, 2025

0.1.181

Apr 6, 2025

0.1.180

Mar 26, 2025

0.1.179

Mar 26, 2025

0.1.178

Mar 19, 2025

0.1.177

Mar 13, 2025

0.1.176

Mar 13, 2025

0.1.175

Mar 12, 2025

0.1.174

Mar 11, 2025

0.1.173

Mar 8, 2025

0.1.172

Mar 7, 2025

0.1.171

Mar 6, 2025

0.1.170

Mar 3, 2025

0.1.169

Feb 26, 2025

0.1.168

Feb 21, 2025

0.1.167

Feb 19, 2025

0.1.166

Feb 19, 2025

0.1.165

Feb 13, 2025

0.1.164

Feb 13, 2025

0.1.163

Feb 9, 2025

0.1.162

Feb 8, 2025

0.1.161

Feb 5, 2025

0.1.160

Feb 5, 2025

0.1.159

Feb 4, 2025

0.1.158

Feb 3, 2025

0.1.157

Feb 2, 2025

0.1.156

Feb 2, 2025

0.1.155

Feb 2, 2025

0.1.154

Feb 2, 2025

0.1.153

Feb 1, 2025

0.1.152

Feb 1, 2025

0.1.151

Jan 30, 2025

0.1.150

Jan 30, 2025

0.1.149

Jan 29, 2025

0.1.148

Jan 27, 2025

0.1.146

Jan 20, 2025

0.1.145

Jan 16, 2025

0.1.144

Jan 3, 2025

0.1.143

Jan 1, 2025

0.1.142

Dec 9, 2024

0.1.141

Dec 8, 2024

0.1.140

Nov 26, 2024

0.1.139

Nov 7, 2024

0.1.138

Oct 28, 2024

0.1.137

Oct 24, 2024

0.1.136

Oct 18, 2024

0.1.135

Oct 10, 2024

0.1.134

Sep 14, 2024

0.1.133

Sep 12, 2024

0.1.132

Sep 11, 2024

0.1.131

Sep 6, 2024

0.1.130

Sep 4, 2024

0.1.129

Aug 26, 2024

0.1.128

Aug 26, 2024

0.1.127

Aug 25, 2024

0.1.126

Aug 18, 2024

0.1.125

Aug 14, 2024

0.1.124

Aug 12, 2024

0.1.123

Aug 3, 2024

0.1.122

Jul 29, 2024

0.1.121

Jul 29, 2024

0.1.120

Jul 25, 2024

0.1.119

Jul 25, 2024

0.1.118

Jul 24, 2024

0.1.117

Jul 24, 2024

0.1.116

Jul 24, 2024

0.1.115

Jul 23, 2024

0.1.114

Jul 18, 2024

0.1.113

Jul 17, 2024

0.1.112

Jul 15, 2024

0.1.111

Jul 6, 2024

0.1.110

Jul 6, 2024

0.1.109

Jul 6, 2024

0.1.108

Jun 24, 2024

0.1.107

Jun 22, 2024

0.1.106

Jun 19, 2024

0.1.105

Jun 18, 2024

0.1.104

Jun 18, 2024

0.1.103

Jun 18, 2024

0.1.102

Jun 14, 2024

0.1.101

Jun 14, 2024

0.1.99

Jun 13, 2024

0.1.98

Jun 8, 2024

0.1.97

Jun 8, 2024

0.1.96

Jun 7, 2024

0.1.95

Jun 5, 2024

0.1.94

Jun 4, 2024

0.1.93

Jun 4, 2024

0.1.92

May 27, 2024

0.1.91

May 26, 2024

0.1.90

May 24, 2024

0.1.89

May 17, 2024

0.1.88

May 14, 2024

0.1.87

May 14, 2024

0.1.85

May 13, 2024

0.1.83

May 10, 2024

0.1.82

May 10, 2024

0.1.81

May 9, 2024

0.1.80

May 1, 2024

0.1.79

Apr 30, 2024

0.1.78

Apr 30, 2024

0.1.77

Apr 29, 2024

0.1.76

Apr 28, 2024

0.1.75

Apr 27, 2024

0.1.73

Apr 26, 2024

0.1.72

Apr 24, 2024

0.1.71

Apr 23, 2024

0.1.70

Apr 22, 2024

0.1.69

Apr 22, 2024

0.1.68

Apr 18, 2024

0.1.67

Apr 18, 2024

0.1.66

Apr 17, 2024

0.1.65

Apr 17, 2024

0.1.64

Apr 15, 2024

0.1.63

Apr 15, 2024

0.1.62

Apr 15, 2024

0.1.61

Apr 15, 2024

0.1.60

Apr 11, 2024

0.1.59

Apr 11, 2024

0.1.57

Apr 8, 2024

0.1.56

Apr 5, 2024

0.1.55

Mar 28, 2024

0.1.54

Mar 26, 2024

0.1.53

Mar 22, 2024

0.1.52

Mar 22, 2024

0.1.51

Mar 19, 2024

0.1.50

Mar 19, 2024

0.1.49

Mar 18, 2024

0.1.48

Mar 17, 2024

0.1.47

Mar 14, 2024

0.1.46

Mar 12, 2024

0.1.45

Mar 12, 2024

0.1.44

Mar 8, 2024

0.1.43

Mar 6, 2024

0.1.42

Mar 4, 2024

0.1.41

Mar 3, 2024

0.1.40

Feb 27, 2024

0.1.39

Jan 29, 2024

0.1.38

Jan 24, 2024

0.1.37

Jan 17, 2024

0.1.36

Jan 16, 2024

0.1.35

Jan 16, 2024

0.1.34

Jan 15, 2024

0.1.33

Jan 7, 2024

0.1.31

Jan 5, 2024

0.1.30

Jan 2, 2024

0.1.29

Dec 31, 2023

0.1.28

Dec 30, 2023

0.1.26

Dec 29, 2023

0.1.24

Dec 27, 2023

0.1.23

Dec 22, 2023

0.1.22

Dec 19, 2023

This version

0.1.21

Dec 19, 2023

0.1.20

Dec 19, 2023

0.1.19

Dec 14, 2023

0.1.18

Dec 14, 2023

0.1.17

Dec 12, 2023

0.1.16

Nov 20, 2023

0.1.15

Nov 17, 2023

0.1.14

Nov 8, 2023

0.1.13

Nov 3, 2023

0.1.12

Oct 31, 2023

0.1.11

Oct 18, 2023

0.1.10

Oct 16, 2023

0.1.9

Oct 14, 2023

0.1.8

Oct 14, 2023

0.1.7

Oct 12, 2023

0.1.6

Oct 12, 2023

0.1.5

Oct 10, 2023

0.1.4

Oct 4, 2023

0.1.3

Sep 26, 2023

0.1.2

Sep 19, 2023

0.1.1

Sep 2, 2023

0.0.6

Jun 24, 2023

0.0.4

May 15, 2023

0.0.3

Apr 26, 2023

0.0.2

Apr 24, 2023

0.0.1

Apr 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

byzerllm-0.1.21.tar.gz (2.6 MB view details)

Uploaded Dec 19, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

byzerllm-0.1.21-py3-none-any.whl (3.0 MB view details)

Uploaded Dec 19, 2023 Python 3

File details

Details for the file byzerllm-0.1.21.tar.gz.

File metadata

Download URL: byzerllm-0.1.21.tar.gz
Upload date: Dec 19, 2023
Size: 2.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for byzerllm-0.1.21.tar.gz
Algorithm	Hash digest
SHA256	`fb763a8ce368866145565260a057a42a797b38bc8dae738ef4d983894f2f7c32`
MD5	`de39f1c37b09c3c40a48a5e807e0f0d7`
BLAKE2b-256	`e23650e50882568d2ec50075ecbfa1473ff1239a5820545e6958efca59d5684d`

See more details on using hashes here.

File details

Details for the file byzerllm-0.1.21-py3-none-any.whl.

File metadata

Download URL: byzerllm-0.1.21-py3-none-any.whl
Upload date: Dec 19, 2023
Size: 3.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for byzerllm-0.1.21-py3-none-any.whl
Algorithm	Hash digest
SHA256	`627285ee4fac03110202ca52ccabe6d738deb8d02d910f2fcffab96ef52c1602`
MD5	`9b3bb25ff3bee401b397b70a6d4797ad`
BLAKE2b-256	`98af2fc839eadad36fab63bdd695da5f47b7d6e214860a8238770651e575889b`

See more details on using hashes here.

byzerllm 0.1.21

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Easy, fast, and cheap pretrain,finetune, serving for everyone

Versions

Installation

Quick Start

Quatization

Supported Models

vLLM Support

Stream Chat

DeepSpeed Support

Function Calling

Respond with pydantic class

LLM-Friendly Function/DataClass

SQL Support

QWen

SaaS Models

qianfan

azure openai

openai

zhipu

minimax

sparkdesk

baichuan

Pretrain

Finetune

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes