picoLLM Inference Engine
Project description
picoLLM Inference Engine Python Binding
Made in Vancouver, Canada by Picovoice
picoLLM Inference Engine
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language models. picoLLM Inference Engine is:
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
- Free for open-weight models
Compatibility
- Python 3.8+
- Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64), and Raspberry Pi (5, 4, and 3).
Installation
pip3 install picollm
Models
picoLLM Inference Engine supports the following open-weight models. The models are on Picovoice Console.
- Gemma
gemma-2b
gemma-2b-it
gemma-7b
gemma-7b-it
- Llama-2
llama-2-7b
llama-2-7b-chat
llama-2-13b
llama-2-13b-chat
llama-2-70b
llama-2-70b-chat
- Llama-3
llama-3-8b
llama-3-8b-instruct
llama-3-70b
llama-3-70b-instruct
- Mistral
mistral-7b-v0.1
mistral-7b-instruct-v0.1
mistral-7b-instruct-v0.2
- Mixtral
mixtral-8x7b-v0.1
mixtral-8x7b-instruct-v0.1
- Phi-2
phi2
AccessKey
AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100% offline and completely free for open-weight models. Everyone who signs up for Picovoice Console receives a unique AccessKey.
Usage
Create an instance of the engine and generate a prompt completion:
import picollm
pllm = picollm.create(
access_key='${ACCESS_KEY}',
model_path='${MODEL_PATH}')
res = pllm.generate(prompt='${PROMPT}')
print(res.completion)
Replace ${ACCESS_KEY}
with yours obtained from Picovoice Console, ${MODEL_PATH}
with the path to a model file
downloaded from Picovoice Console, and ${PROMPT}
with a prompt string.
Instruction-tuned models (e.g., llama-3-8b-instruct
, llama-2-7b-chat
, and gemma-2b-it
) have a specific chat
template. You can either directly format the prompt or use a dialog helper:
dialog = pllm.get_dialog()
dialog.add_human_request(prompt)
res = pllm.generate(prompt=dialog.prompt())
dialog.add_llm_response(res.completion)
print(res.completion)
Finally, when done, be sure to release the resources explicitly:
pllm.release()
Demos
picollmdemo provides command-line utilities for LLM completion and chat using picoLLM.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file picollm-1.0.0.tar.gz
.
File metadata
- Download URL: picollm-1.0.0.tar.gz
- Upload date:
- Size: 9.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d14789f8640a933d6aa0452879c3cb2ba5e3cd0bca1561b6ffa0cfef256e380c |
|
MD5 | 82d2045ced88ae0f40d8cd5f73279fdc |
|
BLAKE2b-256 | d1d900a2f83f022ad391e041957721b34ec20e1b12085ba15cdaed98af4be93e |
File details
Details for the file picollm-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: picollm-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a24ce487a20aca7af8c3292d6bfba604e68df91e0e9bef03caf0802d9f337b75 |
|
MD5 | cc0b3788e2cd5224a4d107176279219c |
|
BLAKE2b-256 | 2b4267ee8530ca1f7fecfe7fa3a43eea3cc6b6f78426ed80c629efeb5ff730b3 |