Skip to main content

Use llama2-wrapper as your local llama2 backend for Generative Agents / Apps

Project description

llama2-wrapper

Features

llama2-wrapper is the backend and part of llama2-webui, which can run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).

Install

pip install llama2-wrapper

Start OpenAI Compatible API

python -m llama2_wrapper.server

it will use llama.cpp as the backend by default to run llama-2-7b-chat.ggmlv3.q4_0.bin model.

Start Fast API for gptq backend:

python -m llama2_wrapper.server --backend_type gptq

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

API Usage

__call__

__call__() is the function to generate text from a prompt.

For example, run ggml llama2 model on CPU, colab example:

from llama2_wrapper import LLAMA2_WRAPPER, get_prompt 
llama2_wrapper = LLAMA2_WRAPPER()
# Default running on backend llama.cpp.
# Automatically downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
prompt = "Do you know Pytorch"
# llama2_wrapper() will run __call__()
answer = llama2_wrapper(get_prompt(prompt), temperature=0.9)

Run gptq llama2 model on Nvidia GPU, colab example:

from llama2_wrapper import LLAMA2_WRAPPER 
llama2_wrapper = LLAMA2_WRAPPER(backend_type="gptq")
# Automatically downloading model to: ./models/Llama-2-7b-Chat-GPTQ

Run llama2 7b with bitsandbytes 8 bit with a model_path:

from llama2_wrapper import LLAMA2_WRAPPER 
llama2_wrapper = LLAMA2_WRAPPER(
	model_path = "./models/Llama-2-7b-chat-hf",
  backend_type = "transformers",
  load_in_8bit = True
)

completion

completion() is the function to generate text from a prompt for OpenAI compatible API /v1/completions.

llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
print(llm.completion(prompt))

chat_completion

chat_completion() is the function to generate text from a dialog (chat history) for OpenAI compatible API /v1/chat/completions.

llama2_wrapper = LLAMA2_WRAPPER()
dialog = [
    {
        "role":"system",
        "content":"You are a helpful, respectful and honest assistant. "
    },{
        "role":"user",
        "content":"Hi do you know Pytorch?",
    },
]
print(llm.chat_completion(dialog))

generate

generate() is the function to create a generator of response from a prompt.

This is useful when you want to stream the output like typing in the chatbot.

llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
for response in llama2_wrapper.generate(prompt):
	print(response)

The response will be like:

Yes, 
Yes, I'm 
Yes, I'm familiar 
Yes, I'm familiar with 
Yes, I'm familiar with PyTorch! 
...

run

run() is similar to generate(), but run()can also accept chat_historyand system_prompt from the users.

It will process the input message to llama2 prompt template with chat_history and system_prompt for a chatbot-like app.

get_prompt

get_prompt() will process the input message to llama2 prompt with chat_history and system_promptfor chatbot.

By default, chat_history and system_prompt are empty and get_prompt() will add llama2 prompt template to your message:

prompt = get_prompt("Hi do you know Pytorch?")

prompt will be:

[INST] <<SYS>>

<</SYS>>

Hi do you know Pytorch? [/INST]

If use get_prompt("Hi do you know Pytorch?", system_prompt="You are a helpful..."):

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. 
<</SYS>>

Hi do you know Pytorch? [/INST]

get_prompt_for_dialog

get_prompt_for_dialog() will process dialog (chat history) to llama2 prompt for OpenAI compatible API /v1/chat/completions.

dialog = [
    {
        "role":"system",
        "content":"You are a helpful, respectful and honest assistant. "
    },{
        "role":"user",
        "content":"Hi do you know Pytorch?",
    },
]
prompt = get_prompt_for_dialog("Hi do you know Pytorch?")
# [INST] <<SYS>>
# You are a helpful, respectful and honest assistant. 
# <</SYS>>
# 
# Hi do you know Pytorch? [/INST]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama2_wrapper-0.1.14.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

llama2_wrapper-0.1.14-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file llama2_wrapper-0.1.14.tar.gz.

File metadata

  • Download URL: llama2_wrapper-0.1.14.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.13 Linux/6.2.0-1012-azure

File hashes

Hashes for llama2_wrapper-0.1.14.tar.gz
Algorithm Hash digest
SHA256 f63eb82d152df1451a12457c69759bcff89c5346c8a28aeb9d0f3293918c1e9f
MD5 fb41cb14dafe76db8f626e2d699ec68a
BLAKE2b-256 d08301654cb606db3c8c2635b9416a5074d4324cc10b375bcee4bc38d3b623d4

See more details on using hashes here.

File details

Details for the file llama2_wrapper-0.1.14-py3-none-any.whl.

File metadata

  • Download URL: llama2_wrapper-0.1.14-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.13 Linux/6.2.0-1012-azure

File hashes

Hashes for llama2_wrapper-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 4774b4d95a8f32628ae7bf69036fbec4a48b6b9d7cb148b431d2cfd250eceb66
MD5 83f0e67339360fccd8f651ee451835c6
BLAKE2b-256 93108579768cffa26aa817256f9a30e5ac2d945b6f1aa966bac52b0f3adff382

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page