Use llama2-wrapper as your local llama2 backend for Generative Agents / Apps
Project description
llama2-wrapper
Use llama2-wrapper as your local llama2 backend for Generative Agents/Apps, colab example.
Features
- Supporting models: Llama-2-7b/13b/70b, all Llama-2-GPTQ, all Llama-2-GGML ...
- Supporting model backends: tranformers, bitsandbytes(8-bit inference), AutoGPTQ(4-bit inference), llama.cpp
- Demos: Run Llama2 on MacBook Air; Run Llama2 on Colab T4 GPU
- News, Benchmark, Issue Solutions
llama2-wrapper is the backend and part of llama2-webui, which can run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).
Install
pip install llama2-wrapper
Usage
__call__
__call__()
is the function to generate text from a prompt.
For example, run ggml llama2 model on CPU, colab example:
from llama2_wrapper import LLAMA2_WRAPPER, get_prompt
llama2_wrapper = LLAMA2_WRAPPER()
# Default running on backend llama.cpp.
# Automatically downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
prompt = "Do you know Pytorch"
# llama2_wrapper() will run __call__()
answer = llama2_wrapper(get_prompt(prompt), temperature=0.9)
Run gptq llama2 model on Nvidia GPU, colab example:
from llama2_wrapper import LLAMA2_WRAPPER
llama2_wrapper = LLAMA2_WRAPPER(backend_type="gptq")
# Automatically downloading model to: ./models/Llama-2-7b-Chat-GPTQ
Run llama2 7b with bitsandbytes 8 bit with a model_path
:
from llama2_wrapper import LLAMA2_WRAPPER
llama2_wrapper = LLAMA2_WRAPPER(
model_path = "./models/Llama-2-7b-chat-hf",
backend_type = "transformers",
load_in_8bit = True
)
generate
generate()
is the function to create a generator of response from a prompt.
This is useful when you want to stream the output like typing in the chatbot.
llama2_wrapper = LLAMA2_WRAPPER()
prompt = get_prompt("Hi do you know Pytorch?")
for response in llama2_wrapper.generate(prompt):
print(response)
The response will be like:
Yes,
Yes, I'm
Yes, I'm familiar
Yes, I'm familiar with
Yes, I'm familiar with PyTorch!
...
run
run()
is similar to generate()
, but run()
can also accept chat_history
and system_prompt
from the users.
It will process the input message to llama2 prompt template with chat_history
and system_prompt
for a chatbot-like app.
get_prompt
get_prompt()
will process the input message to llama2 prompt with chat_history
and system_prompt
for chatbot.
By default, chat_history
and system_prompt
are empty and get_prompt()
will add llama2 prompt template to your message:
prompt = get_prompt("Hi do you know Pytorch?")
prompt will be:
[INST] <<SYS>>
<</SYS>>
Hi do you know Pytorch? [/INST]
If use get_prompt("Hi do you know Pytorch?", system_prompt="You are a helpful...")
:
[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
Hi do you know Pytorch? [/INST]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llama2_wrapper-0.1.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85c3405f57338261aca5f65893d9055a6cfdad4bd877ebb8695cced0f62ad618 |
|
MD5 | ce20b54dcefafae7bdc3781a59e0d879 |
|
BLAKE2b-256 | bf80701a69073b7c34825f3e8e5f3805d7575429cd6986d51c35fec25e2ea7da |