Use llama2-wrapper as your local llama2 backend for Generative Agents / Apps

These details have not been verified by PyPI

Project links

Project description

llama2-wrapper

Use llama2-wrapper as your local llama2 backend for Generative Agents/Apps, colab example.
Run OpenAI Compatible API on Llama2 models.

Features

Supporting models: Llama-2-7b/13b/70b, Llama-2-GPTQ, Llama-2-GGML, CodeLlama...
Supporting model backends: tranformers, bitsandbytes(8-bit inference), AutoGPTQ(4-bit inference), llama.cpp
Demos: Run Llama2 on MacBook Air; Run Llama2 on Colab T4 GPU
Use llama2-wrapper as your local llama2 backend for Generative Agents/Apps; colab example.
Run OpenAI Compatible API on Llama2 models.
News, Benchmark, Issue Solutions

llama2-wrapper is the backend and part of llama2-webui, which can run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).

Install

pip install llama2-wrapper

Start OpenAI Compatible API

python -m llama2_wrapper.server

it will use llama.cpp as the backend by default to run llama-2-7b-chat.ggmlv3.q4_0.bin model.

Start Fast API for gptq backend:

python -m llama2_wrapper.server --backend_type gptq

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

API Usage

`call`

__call__() is the function to generate text from a prompt.

For example, run ggml llama2 model on CPU, colab example:

from llama2_wrapper import LLAMA2_WRAPPER, get_prompt 
llama2_wrapper = LLAMA2_WRAPPER()
# Default running on backend llama.cpp.
# Automatically downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
prompt = "Do you know Pytorch"
# llama2_wrapper() will run __call__()
answer = llama2_wrapper(get_prompt(prompt), temperature=0.9)

Run gptq llama2 model on Nvidia GPU, colab example:

from llama2_wrapper import LLAMA2_WRAPPER 
llama2_wrapper = LLAMA2_WRAPPER(backend_type="gptq")
# Automatically downloading model to: ./models/Llama-2-7b-Chat-GPTQ

Run llama2 7b with bitsandbytes 8 bit with a model_path:

from llama2_wrapper import LLAMA2_WRAPPER 
llama2_wrapper = LLAMA2_WRAPPER(
	model_path = "./models/Llama-2-7b-chat-hf",
  backend_type = "transformers",
  load_in_8bit = True
)

completion

completion() is the function to generate text from a prompt for OpenAI compatible API /v1/completions.

llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
print(llm.completion(prompt))

chat_completion

chat_completion() is the function to generate text from a dialog (chat history) for OpenAI compatible API /v1/chat/completions.

llama2_wrapper = LLAMA2_WRAPPER()
dialog = [
    {
        "role":"system",
        "content":"You are a helpful, respectful and honest assistant. "
    },{
        "role":"user",
        "content":"Hi do you know Pytorch?",
    },
]
print(llm.chat_completion(dialog))

generate

generate() is the function to create a generator of response from a prompt.

This is useful when you want to stream the output like typing in the chatbot.

llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
for response in llama2_wrapper.generate(prompt):
	print(response)

The response will be like:

Yes, 
Yes, I'm 
Yes, I'm familiar 
Yes, I'm familiar with 
Yes, I'm familiar with PyTorch! 
...

run

run() is similar to generate(), but run()can also accept chat_historyand system_prompt from the users.

It will process the input message to llama2 prompt template with chat_history and system_prompt for a chatbot-like app.

get_prompt

get_prompt() will process the input message to llama2 prompt with chat_history and system_promptfor chatbot.

By default, chat_history and system_prompt are empty and get_prompt() will add llama2 prompt template to your message:

prompt = get_prompt("Hi do you know Pytorch?")

prompt will be:

[INST] <<SYS>>

<</SYS>>

Hi do you know Pytorch? [/INST]

If use get_prompt("Hi do you know Pytorch?", system_prompt="You are a helpful..."):

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. 
<</SYS>>

Hi do you know Pytorch? [/INST]

get_prompt_for_dialog

get_prompt_for_dialog() will process dialog (chat history) to llama2 prompt for OpenAI compatible API /v1/chat/completions.

dialog = [
    {
        "role":"system",
        "content":"You are a helpful, respectful and honest assistant. "
    },{
        "role":"user",
        "content":"Hi do you know Pytorch?",
    },
]
prompt = get_prompt_for_dialog("Hi do you know Pytorch?")
# [INST] <<SYS>>
# You are a helpful, respectful and honest assistant. 
# <</SYS>>
# 
# Hi do you know Pytorch? [/INST]

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.14

Oct 4, 2023

0.1.13

Oct 3, 2023

This version

0.1.12

Aug 26, 2023

0.1.11

Aug 26, 2023

0.1.10

Aug 24, 2023

0.1.9

Aug 23, 2023

0.1.8

Aug 20, 2023

0.1.7

Aug 16, 2023

0.1.6

Aug 11, 2023

0.1.5

Aug 11, 2023

0.1.4

Aug 10, 2023

0.1.3

Jul 27, 2023

0.1.2

Jul 27, 2023

0.1.0

Jul 27, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama2_wrapper-0.1.12.tar.gz (14.8 kB view hashes)

Uploaded Aug 26, 2023 Source

Built Distribution

llama2_wrapper-0.1.12-py3-none-any.whl (15.7 kB view hashes)

Uploaded Aug 26, 2023 Python 3

Hashes for llama2_wrapper-0.1.12.tar.gz

Hashes for llama2_wrapper-0.1.12.tar.gz
Algorithm	Hash digest
SHA256	`84b81c0d262b8db2c2dc23c88e3c153c82eed9add8653264cc3eecef9a8a557b`
MD5	`6e569c22a962e4904e47be0d3a9a8cac`
BLAKE2b-256	`1eadf8208f01d84b57c4926684123c04d47f067ef5d0404e05e6b45c4d02efcd`

Hashes for llama2_wrapper-0.1.12-py3-none-any.whl

Hashes for llama2_wrapper-0.1.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13636bacf2370513a90e1a4726d2e652c9df01843678a2e63a867e95c76edfc3`
MD5	`fd06c745f4cb169ba6daaca2fa9107cb`
BLAKE2b-256	`59935bd678b40034ef6597feb4e49949bd9ab414137c9d3030028eb25d0aeab2`