A tool for running on-premise large language models on non-public data

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

OnPrem

OnPrem is a simple Python package that makes it easier to run large language models (LLMs) on non-public or sensitive data and on machines with no internet connectivity (e.g., behind corporate firewalls). Inspired by the privateGPT and localGPT GitHub repos, OnPrem is intended to make it easier to integrate local LLMs in practical applications.

Install

pip install onprem

For GPU support, see additional instructions below.

How to use

Setup

import os.path
from onprem import LLM

url = 'https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/resolve/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin'

llm = LLM(model_name=os.path.basename(url))
llm.download_model(url, ssl_verify=True ) # set to False if corporate firewall gives you problems

There is already a file Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin in /home/amaiya/onprem_data. Do you want to still download it? (Y/n) Y
[██████████████████████████████████████████████████]

Send Prompts to the LLM

prompt = """Extract the names of people in the supplied sentences. Here is an example:
Sentence: James Gandolfini and Paul Newman were great actors.
People:
James Gandolfini, Paul Newman
Sentence:
I like Cillian Murphy's acting. Florence Pugh is great, too.
People:"""

saved_output = llm.prompt(prompt)

Cillian Murphy, Florence Pugh

How to Speed Up Inference Using a GPU

The above example employed the use of a CPU.
If you have a GPU (even an older one with less VRAM), you can speed up responses.

Step 1: Install `llama-cpp-python` with CUDABLAS support

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python==0.1.69 --no-cache-dir

It is important to use the specific version shown above due to library incompatibilities.

Step 2: Use the `n_gpu_layers` argument with `LLM`

llm = LLM(model_name=os.path.basename(url), n_gpu_layers=128)

With the steps above, calls to methods like llm.prompt will offload computation to your GPU and speed up responses from the LLM.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.2.4

Sep 30, 2024

0.2.3

Sep 27, 2024

0.2.2

Sep 26, 2024

0.2.1

Sep 26, 2024

0.2.0

Sep 25, 2024

0.1.4

Sep 25, 2024

0.1.3

Aug 16, 2024

0.1.2

Jun 5, 2024

0.1.1

Jun 3, 2024

0.1.0

May 29, 2024

0.0.36

Jan 16, 2024

0.0.35

Jan 15, 2024

0.0.34

Jan 13, 2024

0.0.33

Jan 8, 2024

0.0.32

Dec 11, 2023

0.0.31

Dec 9, 2023

0.0.30

Dec 7, 2023

0.0.29

Oct 27, 2023

0.0.28

Oct 6, 2023

0.0.27

Sep 30, 2023

0.0.26

Sep 27, 2023

0.0.25

Sep 27, 2023

0.0.24

Sep 26, 2023

0.0.23

Sep 25, 2023

0.0.22

Sep 24, 2023

0.0.21

Sep 22, 2023

0.0.20

Sep 22, 2023

0.0.19

Sep 21, 2023

0.0.18

Sep 19, 2023

0.0.17

Sep 17, 2023

0.0.16

Sep 12, 2023

0.0.15

Sep 11, 2023

0.0.14

Sep 11, 2023

0.0.14.dev0 pre-release

Sep 11, 2023

0.0.13

Sep 10, 2023

0.0.12

Sep 9, 2023

0.0.11

Sep 8, 2023

0.0.10

Sep 8, 2023

0.0.9

Sep 6, 2023

0.0.8

Sep 5, 2023

0.0.7

Sep 4, 2023

0.0.6

Sep 4, 2023

0.0.5

Sep 3, 2023

0.0.4

Sep 1, 2023

0.0.3

Sep 1, 2023

This version

0.0.2

Sep 1, 2023

0.0.1

Sep 1, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onprem-0.0.2.tar.gz (12.1 kB view hashes)

Uploaded Sep 1, 2023 Source

Built Distribution

onprem-0.0.2-py3-none-any.whl (12.1 kB view hashes)

Uploaded Sep 1, 2023 Python 3

Hashes for onprem-0.0.2.tar.gz

Hashes for onprem-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`fa9ed908469582e59ad64d380dffcb5979851821a0d5c88037f61daea7d407c2`
MD5	`7160954973be4f8057b45fddb0eabe9c`
BLAKE2b-256	`7f79abaaf0d890d4bc390855b5048a41f0b7a050566115a7ac362d4b893200da`

Hashes for onprem-0.0.2-py3-none-any.whl

Hashes for onprem-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1027feb290029b7eee0a8dc0fffe409d704530bc0437238e9f9e9b211b9a5319`
MD5	`80647610d384687ede6fb7d3325a4951`
BLAKE2b-256	`b66ab0f6beb3384a17e8117d12dec70326439c3536eefee88e2b15768b7e3419`

onprem 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OnPrem

Install

How to use

Setup

Send Prompts to the LLM

How to Speed Up Inference Using a GPU

Step 1: Install `llama-cpp-python` with CUDABLAS support

Step 2: Use the `n_gpu_layers` argument with `LLM`

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

onprem 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OnPrem

Install

How to use

Setup

Send Prompts to the LLM

How to Speed Up Inference Using a GPU

Step 1: Install llama-cpp-python with CUDABLAS support

Step 2: Use the n_gpu_layers argument with LLM

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Step 1: Install `llama-cpp-python` with CUDABLAS support

Step 2: Use the `n_gpu_layers` argument with `LLM`