Use llama2-wrapper as your local llama2 backend for Generative Agents / Apps
Project description
llama2-wrapper
-
Use llama2-wrapper as your local llama2 backend for Generative Agents/Apps, colab example.
-
Run OpenAI Compatible API on Llama2 models.
Features
- Supporting models: Llama-2-7b/13b/70b, Llama-2-GPTQ, Llama-2-GGML, CodeLlama...
- Supporting model backends: tranformers, bitsandbytes(8-bit inference), AutoGPTQ(4-bit inference), llama.cpp
- Demos: Run Llama2 on MacBook Air; Run Llama2 on Colab T4 GPU
- Use llama2-wrapper as your local llama2 backend for Generative Agents/Apps; colab example.
- Run OpenAI Compatible API on Llama2 models.
- News, Benchmark, Issue Solutions
llama2-wrapper is the backend and part of llama2-webui, which can run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).
Install
pip install llama2-wrapper
Start OpenAI Compatible API
python -m llama2_wrapper.server
it will use llama.cpp
as the backend by default to run llama-2-7b-chat.ggmlv3.q4_0.bin
model.
Start Fast API for gptq
backend:
python -m llama2_wrapper.server --backend_type gptq
Navigate to http://localhost:8000/docs to see the OpenAPI documentation.
API Usage
__call__
__call__()
is the function to generate text from a prompt.
For example, run ggml llama2 model on CPU, colab example:
from llama2_wrapper import LLAMA2_WRAPPER, get_prompt
llama2_wrapper = LLAMA2_WRAPPER()
# Default running on backend llama.cpp.
# Automatically downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
prompt = "Do you know Pytorch"
# llama2_wrapper() will run __call__()
answer = llama2_wrapper(get_prompt(prompt), temperature=0.9)
Run gptq llama2 model on Nvidia GPU, colab example:
from llama2_wrapper import LLAMA2_WRAPPER
llama2_wrapper = LLAMA2_WRAPPER(backend_type="gptq")
# Automatically downloading model to: ./models/Llama-2-7b-Chat-GPTQ
Run llama2 7b with bitsandbytes 8 bit with a model_path
:
from llama2_wrapper import LLAMA2_WRAPPER
llama2_wrapper = LLAMA2_WRAPPER(
model_path = "./models/Llama-2-7b-chat-hf",
backend_type = "transformers",
load_in_8bit = True
)
completion
completion()
is the function to generate text from a prompt for OpenAI compatible API /v1/completions
.
llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
print(llm.completion(prompt))
chat_completion
chat_completion()
is the function to generate text from a dialog (chat history) for OpenAI compatible API /v1/chat/completions
.
llama2_wrapper = LLAMA2_WRAPPER()
dialog = [
{
"role":"system",
"content":"You are a helpful, respectful and honest assistant. "
},{
"role":"user",
"content":"Hi do you know Pytorch?",
},
]
print(llm.chat_completion(dialog))
generate
generate()
is the function to create a generator of response from a prompt.
This is useful when you want to stream the output like typing in the chatbot.
llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
for response in llama2_wrapper.generate(prompt):
print(response)
The response will be like:
Yes,
Yes, I'm
Yes, I'm familiar
Yes, I'm familiar with
Yes, I'm familiar with PyTorch!
...
run
run()
is similar to generate()
, but run()
can also accept chat_history
and system_prompt
from the users.
It will process the input message to llama2 prompt template with chat_history
and system_prompt
for a chatbot-like app.
get_prompt
get_prompt()
will process the input message to llama2 prompt with chat_history
and system_prompt
for chatbot.
By default, chat_history
and system_prompt
are empty and get_prompt()
will add llama2 prompt template to your message:
prompt = get_prompt("Hi do you know Pytorch?")
prompt will be:
[INST] <<SYS>>
<</SYS>>
Hi do you know Pytorch? [/INST]
If use get_prompt("Hi do you know Pytorch?", system_prompt="You are a helpful...")
:
[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Hi do you know Pytorch? [/INST]
get_prompt_for_dialog
get_prompt_for_dialog()
will process dialog (chat history) to llama2 prompt for OpenAI compatible API /v1/chat/completions
.
dialog = [
{
"role":"system",
"content":"You are a helpful, respectful and honest assistant. "
},{
"role":"user",
"content":"Hi do you know Pytorch?",
},
]
prompt = get_prompt_for_dialog("Hi do you know Pytorch?")
# [INST] <<SYS>>
# You are a helpful, respectful and honest assistant.
# <</SYS>>
#
# Hi do you know Pytorch? [/INST]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llama2_wrapper-0.1.14-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4774b4d95a8f32628ae7bf69036fbec4a48b6b9d7cb148b431d2cfd250eceb66 |
|
MD5 | 83f0e67339360fccd8f651ee451835c6 |
|
BLAKE2b-256 | 93108579768cffa26aa817256f9a30e5ac2d945b6f1aa966bac52b0f3adff382 |