A tool for running on-premise large language models on non-public data
Project description
OnPrem
OnPrem is a simple Python package that makes it easier to run large language models (LLMs) on non-public or sensitive data and on machines with no internet connectivity (e.g., behind corporate firewalls). Inspired by the privateGPT and localGPT GitHub repos, OnPrem is intended to make it easier to integrate local LLMs in practical applications.
Install
pip install onprem
For GPU support, see additional instructions below.
How to use
Setup
import os.path
from onprem import LLM
url = 'https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin'
llm = LLM(model_name=os.path.basename(url))
llm.download_model(url, ssl_verify=True ) # set to False if corporate firewall gives you problems
There is already a file Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin in /home/amaiya/onprem_data. Do you want to still download it? (Y/n) Y
[██████████████████████████████████████████████████]
Send Prompts to the LLM
prompt = """Extract the names of people in the supplied sentences. Here is an example:
Sentence: James Gandolfini and Paul Newman were great actors.
People:
James Gandolfini, Paul Newman
Sentence:
I like Cillian Murphy's acting. Florence Pugh is great, too.
People:"""
saved_output = llm.prompt(prompt)
Cillian Murphy, Florence Pugh
How to Speed Up Inference Using a GPU
The above example employed the use of a CPU.
If you have a GPU (even an older one with less VRAM), you can speed up
responses.
Step 1: Install llama-cpp-python
with CUDABLAS support
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python==0.1.69 --no-cache-dir
It is important to use the specific version shown above due to library incompatibilities.
Step 2: Use the n_gpu_layers
argument with LLM
llm = LLM(model_name=os.path.basename(url), n_gpu_layers=128)
With the steps above, calls to methods like llm.prompt
will offload
computation to your GPU and speed up responses from the LLM.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.