Skip to main content

ByzerLLM: Byzer LLM

Project description

Byzer-LLM

Byzer-LLM is a LLM full lifecycle solution that includes pretrain, fintune, deployment and serving based on Ray.

The key differences between Byzer-LLM and other LLM solutions have two. The first one is that Byzer-LLM supports Byzer-SQL which is a SQL dialect that can be used to manage the LLM lifecycle as the other solutions only support Python API.

  1. Python (alpha)
  2. Byzer-SQL (stable)
  3. Rest API (todo...)

The second one is that Byzer-LLM is totally based on Ray. This means you can deploy multiple LLM models on a single machine or a cluster. This is very useful for large scale LLM deployment. And Byzer-LLM also supports vLLM/DeepSpeed/Transformers as the inference backend transparently.

Versions

  • 0.1.13: support shutdown cluster for byzer-retrieval
  • 0.1.12: Support Python API (alpha)
  • 0.1.5: Support python wrapper for byzer-retrieval

Installation

pip install -r requirements.txt
pip install -U byzerllm
ray start --head

Usage (Python)

import ray
from byzerllm.utils.client import ByzerLLM,LLMRequest,InferBackend
ray.init(address="auto",namespace="default",ignore_reinit_error=True)

llm = ByzerLLM()

llm.setup_gpus_per_worker(4).setup_num_workers(1)
llm.setup_infer_backend(InferBackend.transformers)

llm.deploy(model_path="/home/byzerllm/models/openbuddy-llama-13b-v5-fp16",
           pretrained_model_type="custom/llama2",
           udf_name="llama2_chat",infer_params={})

llm.chat("llama2_chat",LLMRequest(instruction="hello world"))[0].output

The above code will deploy a llama2 model and then use the model to infer the input text. The Python API is very simple and easy to use and it is very useful to explore the LLM model.

Usage (Byzer-SQL)

The following code have the same effect as the above python code.

!byzerllm setup single;
!byzerllm setup "num_gpus=4";
!byzerllm setup "maxConcurrency=1";
!byzerllm setup "infer_backend=transformers";

run command as LLM.`` where 
action="infer"
and pretrainedModelType="custom/llama2"
and localModelDir="/home/byzerllm/models/openbuddy-llama-13b-v5-fp16"
and reconnect="false"
and udfName="llama2_chat"
and modelTable="command";

select 
llama2_chat(llm_param(map(
              "user_role","User",
              "assistant_role","Assistant",
              "system_msg",'You are a helpful assistant. Think it over and answer the user question correctly.',
              "instruction",llm_prompt('
Please remenber my name: {0}              
',array("Zhu William"))

)))

 as q as q1;

Once you deploy the model with run command as LLM, then you can ues the model as a SQL function. This feature is very useful for data scientists who want to use LLM in their data analysis or data engineers who want to use LLM in their data pipeline.

Cooperate with Byzer-Retrieval

Byzer-LLM can cooperate with Byzer-Retrieval to build a RAG application. The following code shows how to use Byzer-LLM and Byzer-Retrieval together.

The first step is connect to Ray cluster:

code_search_path=["/home/byzerllm/softwares/byzer-retrieval-lib/"]
env_vars = {"JAVA_HOME": "/home/byzerllm/softwares/jdk-21",
            "PATH":"/home/byzerllm/softwares/jdk-21/bin:/home/byzerllm/.rvm/gems/ruby-3.2.2/bin:/home/byzerllm/.rvm/gems/ruby-3.2.2@global/bin:/home/byzerllm/.rvm/rubies/ruby-3.2.2/bin:/home/byzerllm/.rbenv/shims:/home/byzerllm/.rbenv/bin:/home/byzerllm/softwares/byzer-lang-all-in-one-linux-amd64-3.3.0-2.3.7/jdk8/bin:/usr/local/cuda/bin:/usr/local/cuda/bin:/home/byzerllm/.rbenv/shims:/home/byzerllm/.rbenv/bin:/home/byzerllm/miniconda3/envs/byzerllm-dev/bin:/home/byzerllm/miniconda3/condabin:/home/byzerllm/.local/bin:/home/byzerllm/bin:/usr/local/cuda/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/byzerllm/.rvm/bin:/home/byzerllm/.rvm/bin"}

import ray

ray.init(address="auto",namespace="default",
                 job_config=ray.job_config.JobConfig(code_search_path=code_search_path,
                                                      runtime_env={"env_vars": env_vars})
                 )            

The second step is to create ByzerLLM/ByzerRetrieval:

from byzerllm.utils.retrieval import ByzerRetrieval
from byzerllm.utils.client import ByzerLLM,LLMRequest,LLMResponse,LLMHistoryItem,LLMRequestExtra
from byzerllm.records import SearchQuery

retrieval = ByzerRetrieval()
retrieval.launch_gateway()

llm = ByzerLLM()

Now you can use llm.chat and retrieval.search to build a RAG application.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

byzerllm-0.1.14.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

byzerllm-0.1.14-py3-none-any.whl (2.9 MB view details)

Uploaded Python 3

File details

Details for the file byzerllm-0.1.14.tar.gz.

File metadata

  • Download URL: byzerllm-0.1.14.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for byzerllm-0.1.14.tar.gz
Algorithm Hash digest
SHA256 97ff59e43aff5bf6d7a6765e16f6e89cb47afebe8b08dde43722c985dd3e8f5b
MD5 8ba7650f9fad40e004d8be40f22f4d13
BLAKE2b-256 6417ea18d8f9d87baf6c50038b7534cc4faa50d0605e54295a02882544645d7f

See more details on using hashes here.

File details

Details for the file byzerllm-0.1.14-py3-none-any.whl.

File metadata

  • Download URL: byzerllm-0.1.14-py3-none-any.whl
  • Upload date:
  • Size: 2.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for byzerllm-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 216f9450b193717d20780f20a55af842a7c438d9b1876be709735b91eab45d8b
MD5 489825652737e77b2d1718ba04107730
BLAKE2b-256 4bd074f36aa4f4308730aad02bb57c3cc0eca4c292dd0b27ffd0c405e9f59949

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page