Skip to main content

Use llama2-wrapper as your local llama2 backend for Generative Agents / Apps

Project description

llama2-wrapper

Features

llama2-wrapper is the backend and part of llama2-webui, which can run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).

Install

pip install llama2-wrapper

Start OpenAI Compatible API

python -m llama2_wrapper.server

it will use llama.cpp as the backend by default to run llama-2-7b-chat.ggmlv3.q4_0.bin model.

Start Fast API for gptq backend:

python -m llama2_wrapper.server --backend_type gptq

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

API Usage

__call__

__call__() is the function to generate text from a prompt.

For example, run ggml llama2 model on CPU, colab example:

from llama2_wrapper import LLAMA2_WRAPPER, get_prompt 
llama2_wrapper = LLAMA2_WRAPPER()
# Default running on backend llama.cpp.
# Automatically downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
prompt = "Do you know Pytorch"
# llama2_wrapper() will run __call__()
answer = llama2_wrapper(get_prompt(prompt), temperature=0.9)

Run gptq llama2 model on Nvidia GPU, colab example:

from llama2_wrapper import LLAMA2_WRAPPER 
llama2_wrapper = LLAMA2_WRAPPER(backend_type="gptq")
# Automatically downloading model to: ./models/Llama-2-7b-Chat-GPTQ

Run llama2 7b with bitsandbytes 8 bit with a model_path:

from llama2_wrapper import LLAMA2_WRAPPER 
llama2_wrapper = LLAMA2_WRAPPER(
	model_path = "./models/Llama-2-7b-chat-hf",
  backend_type = "transformers",
  load_in_8bit = True
)

completion

completion() is the function to generate text from a prompt for OpenAI compatible API /v1/completions.

llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
print(llm.completion(prompt))

chat_completion

chat_completion() is the function to generate text from a dialog (chat history) for OpenAI compatible API /v1/chat/completions.

llama2_wrapper = LLAMA2_WRAPPER()
dialog = [
    {
        "role":"system",
        "content":"You are a helpful, respectful and honest assistant. "
    },{
        "role":"user",
        "content":"Hi do you know Pytorch?",
    },
]
print(llm.chat_completion(dialog))

generate

generate() is the function to create a generator of response from a prompt.

This is useful when you want to stream the output like typing in the chatbot.

llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
for response in llama2_wrapper.generate(prompt):
	print(response)

The response will be like:

Yes, 
Yes, I'm 
Yes, I'm familiar 
Yes, I'm familiar with 
Yes, I'm familiar with PyTorch! 
...

run

run() is similar to generate(), but run()can also accept chat_historyand system_prompt from the users.

It will process the input message to llama2 prompt template with chat_history and system_prompt for a chatbot-like app.

get_prompt

get_prompt() will process the input message to llama2 prompt with chat_history and system_promptfor chatbot.

By default, chat_history and system_prompt are empty and get_prompt() will add llama2 prompt template to your message:

prompt = get_prompt("Hi do you know Pytorch?")

prompt will be:

[INST] <<SYS>>

<</SYS>>

Hi do you know Pytorch? [/INST]

If use get_prompt("Hi do you know Pytorch?", system_prompt="You are a helpful..."):

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. 
<</SYS>>

Hi do you know Pytorch? [/INST]

get_prompt_for_dialog

get_prompt_for_dialog() will process dialog (chat history) to llama2 prompt for OpenAI compatible API /v1/chat/completions.

dialog = [
    {
        "role":"system",
        "content":"You are a helpful, respectful and honest assistant. "
    },{
        "role":"user",
        "content":"Hi do you know Pytorch?",
    },
]
prompt = get_prompt_for_dialog("Hi do you know Pytorch?")
# [INST] <<SYS>>
# You are a helpful, respectful and honest assistant. 
# <</SYS>>
# 
# Hi do you know Pytorch? [/INST]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama2_wrapper-0.1.12.tar.gz (14.8 kB view hashes)

Uploaded Source

Built Distribution

llama2_wrapper-0.1.12-py3-none-any.whl (15.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page