Use llama2-wrapper as your local llama2 backend for Generative Agents / Apps
Project description
llama2-wrapper
-
Use llama2-wrapper as your local llama2 backend for Generative Agents/Apps, colab example.
-
Run OpenAI Compatible API on Llama2 models.
Features
- Supporting models: Llama-2-7b/13b/70b, Llama-2-GPTQ, Llama-2-GGML, CodeLlama...
- Supporting model backends: tranformers, bitsandbytes(8-bit inference), AutoGPTQ(4-bit inference), llama.cpp
- Demos: Run Llama2 on MacBook Air; Run Llama2 on Colab T4 GPU
- Use llama2-wrapper as your local llama2 backend for Generative Agents/Apps; colab example.
- Run OpenAI Compatible API on Llama2 models.
- News, Benchmark, Issue Solutions
llama2-wrapper is the backend and part of llama2-webui, which can run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).
Install
pip install llama2-wrapper
Start OpenAI Compatible API
python -m llama2_wrapper.server
it will use llama.cpp
as the backend by default to run llama-2-7b-chat.ggmlv3.q4_0.bin
model.
Start Fast API for gptq
backend:
python -m llama2_wrapper.server --backend_type gptq
Navigate to http://localhost:8000/docs to see the OpenAPI documentation.
API Usage
__call__
__call__()
is the function to generate text from a prompt.
For example, run ggml llama2 model on CPU, colab example:
from llama2_wrapper import LLAMA2_WRAPPER, get_prompt
llama2_wrapper = LLAMA2_WRAPPER()
# Default running on backend llama.cpp.
# Automatically downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
prompt = "Do you know Pytorch"
# llama2_wrapper() will run __call__()
answer = llama2_wrapper(get_prompt(prompt), temperature=0.9)
Run gptq llama2 model on Nvidia GPU, colab example:
from llama2_wrapper import LLAMA2_WRAPPER
llama2_wrapper = LLAMA2_WRAPPER(backend_type="gptq")
# Automatically downloading model to: ./models/Llama-2-7b-Chat-GPTQ
Run llama2 7b with bitsandbytes 8 bit with a model_path
:
from llama2_wrapper import LLAMA2_WRAPPER
llama2_wrapper = LLAMA2_WRAPPER(
model_path = "./models/Llama-2-7b-chat-hf",
backend_type = "transformers",
load_in_8bit = True
)
completion
completion()
is the function to generate text from a prompt for OpenAI compatible API /v1/completions
.
llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
print(llm.completion(prompt))
chat_completion
chat_completion()
is the function to generate text from a dialog (chat history) for OpenAI compatible API /v1/chat/completions
.
llama2_wrapper = LLAMA2_WRAPPER()
dialog = [
{
"role":"system",
"content":"You are a helpful, respectful and honest assistant. "
},{
"role":"user",
"content":"Hi do you know Pytorch?",
},
]
print(llm.chat_completion(dialog))
generate
generate()
is the function to create a generator of response from a prompt.
This is useful when you want to stream the output like typing in the chatbot.
llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
for response in llama2_wrapper.generate(prompt):
print(response)
The response will be like:
Yes,
Yes, I'm
Yes, I'm familiar
Yes, I'm familiar with
Yes, I'm familiar with PyTorch!
...
run
run()
is similar to generate()
, but run()
can also accept chat_history
and system_prompt
from the users.
It will process the input message to llama2 prompt template with chat_history
and system_prompt
for a chatbot-like app.
get_prompt
get_prompt()
will process the input message to llama2 prompt with chat_history
and system_prompt
for chatbot.
By default, chat_history
and system_prompt
are empty and get_prompt()
will add llama2 prompt template to your message:
prompt = get_prompt("Hi do you know Pytorch?")
prompt will be:
[INST] <<SYS>>
<</SYS>>
Hi do you know Pytorch? [/INST]
If use get_prompt("Hi do you know Pytorch?", system_prompt="You are a helpful...")
:
[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Hi do you know Pytorch? [/INST]
get_prompt_for_dialog
get_prompt_for_dialog()
will process dialog (chat history) to llama2 prompt for OpenAI compatible API /v1/chat/completions
.
dialog = [
{
"role":"system",
"content":"You are a helpful, respectful and honest assistant. "
},{
"role":"user",
"content":"Hi do you know Pytorch?",
},
]
prompt = get_prompt_for_dialog("Hi do you know Pytorch?")
# [INST] <<SYS>>
# You are a helpful, respectful and honest assistant.
# <</SYS>>
#
# Hi do you know Pytorch? [/INST]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llama2_wrapper-0.1.14.tar.gz
.
File metadata
- Download URL: llama2_wrapper-0.1.14.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.13 Linux/6.2.0-1012-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f63eb82d152df1451a12457c69759bcff89c5346c8a28aeb9d0f3293918c1e9f |
|
MD5 | fb41cb14dafe76db8f626e2d699ec68a |
|
BLAKE2b-256 | d08301654cb606db3c8c2635b9416a5074d4324cc10b375bcee4bc38d3b623d4 |
File details
Details for the file llama2_wrapper-0.1.14-py3-none-any.whl
.
File metadata
- Download URL: llama2_wrapper-0.1.14-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.13 Linux/6.2.0-1012-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4774b4d95a8f32628ae7bf69036fbec4a48b6b9d7cb148b431d2cfd250eceb66 |
|
MD5 | 83f0e67339360fccd8f651ee451835c6 |
|
BLAKE2b-256 | 93108579768cffa26aa817256f9a30e5ac2d945b6f1aa966bac52b0f3adff382 |