Skip to main content

Use llama2-wrapper as your local llama2 backend for Generative Agents / Apps

Project description

llama2-wrapper

Features

llama2-wrapper is the backend and part of llama2-webui, which can run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).

Install

pip install llama2-wrapper

Start OpenAI Compatible API

python3 -m llama2_wrapper.server

it will use llama.cpp as the backend by default to run llama-2-7b-chat.ggmlv3.q4_0.bin model.

Start Fast API for gptq backend:

python3 -m llama2_wrapper.server --backend_type gptq

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

API Usage

__call__

__call__() is the function to generate text from a prompt.

For example, run ggml llama2 model on CPU, colab example:

from llama2_wrapper import LLAMA2_WRAPPER, get_prompt 
llama2_wrapper = LLAMA2_WRAPPER()
# Default running on backend llama.cpp.
# Automatically downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
prompt = "Do you know Pytorch"
# llama2_wrapper() will run __call__()
answer = llama2_wrapper(get_prompt(prompt), temperature=0.9)

Run gptq llama2 model on Nvidia GPU, colab example:

from llama2_wrapper import LLAMA2_WRAPPER 
llama2_wrapper = LLAMA2_WRAPPER(backend_type="gptq")
# Automatically downloading model to: ./models/Llama-2-7b-Chat-GPTQ

Run llama2 7b with bitsandbytes 8 bit with a model_path:

from llama2_wrapper import LLAMA2_WRAPPER 
llama2_wrapper = LLAMA2_WRAPPER(
	model_path = "./models/Llama-2-7b-chat-hf",
  backend_type = "transformers",
  load_in_8bit = True
)

completion

completion() is the function to generate text from a prompt for OpenAI compatible API /v1/completions.

llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
print(llm.completion(prompt))

chat_completion

chat_completion() is the function to generate text from a dialog (chat history) for OpenAI compatible API /v1/chat/completions.

llama2_wrapper = LLAMA2_WRAPPER()
dialog = [
    {
        "role":"system",
        "content":"You are a helpful, respectful and honest assistant. "
    },{
        "role":"user",
        "content":"Hi do you know Pytorch?",
    },
]
print(llm.chat_completion(dialog))

generate

generate() is the function to create a generator of response from a prompt.

This is useful when you want to stream the output like typing in the chatbot.

llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
for response in llama2_wrapper.generate(prompt):
	print(response)

The response will be like:

Yes, 
Yes, I'm 
Yes, I'm familiar 
Yes, I'm familiar with 
Yes, I'm familiar with PyTorch! 
...

run

run() is similar to generate(), but run()can also accept chat_historyand system_prompt from the users.

It will process the input message to llama2 prompt template with chat_history and system_prompt for a chatbot-like app.

get_prompt

get_prompt() will process the input message to llama2 prompt with chat_history and system_promptfor chatbot.

By default, chat_history and system_prompt are empty and get_prompt() will add llama2 prompt template to your message:

prompt = get_prompt("Hi do you know Pytorch?")

prompt will be:

[INST] <<SYS>>

<</SYS>>

Hi do you know Pytorch? [/INST]

If use get_prompt("Hi do you know Pytorch?", system_prompt="You are a helpful..."):

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. 
<</SYS>>

Hi do you know Pytorch? [/INST]

get_prompt_for_dialog

get_prompt_for_dialog() will process dialog (chat history) to llama2 prompt for OpenAI compatible API /v1/chat/completions.

dialog = [
    {
        "role":"system",
        "content":"You are a helpful, respectful and honest assistant. "
    },{
        "role":"user",
        "content":"Hi do you know Pytorch?",
    },
]
prompt = get_prompt_for_dialog("Hi do you know Pytorch?")
# [INST] <<SYS>>
# You are a helpful, respectful and honest assistant. 
# <</SYS>>
# 
# Hi do you know Pytorch? [/INST]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama2_wrapper-0.1.10.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama2_wrapper-0.1.10-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file llama2_wrapper-0.1.10.tar.gz.

File metadata

  • Download URL: llama2_wrapper-0.1.10.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/5.15.0-1041-azure

File hashes

Hashes for llama2_wrapper-0.1.10.tar.gz
Algorithm Hash digest
SHA256 7689c841e1b1a280422cb0e97b8d974cb5d7849319db65556f8bde8ab7e193b6
MD5 cf2bcd5192997d03546bbeab40482c65
BLAKE2b-256 930daaf4a63932af5ec155d274553f6817a6d2119c1b301b84a87e3724dc694b

See more details on using hashes here.

File details

Details for the file llama2_wrapper-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: llama2_wrapper-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/5.15.0-1041-azure

File hashes

Hashes for llama2_wrapper-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 ef2b5fa3afb3dc3dd6a2c7b7adf393fead87145adc194cf105caf58986b76ea8
MD5 41b220260128ad540733eca3281faf1c
BLAKE2b-256 25d8857616231eb20d95223540df64907fd597fbd1b4343a5964759768dd2852

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page