Skip to main content

picoLLM Inference Engine

Project description

picoLLM Inference Engine Python Binding

Made in Vancouver, Canada by Picovoice

picoLLM Inference Engine

picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language models. picoLLM Inference Engine is:

  • Accurate; picoLLM Compression improves GPTQ by up to 98%.
  • Private; LLM inference runs 100% locally.
  • Cross-Platform
  • Runs on CPU and GPU
  • Free for open-weight models

Compatibility

  • Python 3.8+
  • Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64), and Raspberry Pi (5, 4, and 3).

Installation

pip3 install picollm

Models

picoLLM Inference Engine supports the following open-weight models. The models are on Picovoice Console.

  • Gemma
    • gemma-2b
    • gemma-2b-it
    • gemma-7b
    • gemma-7b-it
  • Llama-2
    • llama-2-7b
    • llama-2-7b-chat
    • llama-2-13b
    • llama-2-13b-chat
    • llama-2-70b
    • llama-2-70b-chat
  • Llama-3
    • llama-3-8b
    • llama-3-8b-instruct
    • llama-3-70b
    • llama-3-70b-instruct
  • Mistral
    • mistral-7b-v0.1
    • mistral-7b-instruct-v0.1
    • mistral-7b-instruct-v0.2
  • Mixtral
    • mixtral-8x7b-v0.1
    • mixtral-8x7b-instruct-v0.1
  • Phi-2
    • phi2

AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100% offline and completely free for open-weight models. Everyone who signs up for Picovoice Console receives a unique AccessKey.

Usage

Create an instance of the engine and generate a prompt completion:

import picollm

pllm = picollm.create(
    access_key='${ACCESS_KEY}',
    model_path='${MODEL_PATH}')

res = pllm.generate(prompt='${PROMPT}')
print(res.completion)

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console, ${MODEL_PATH} with the path to a model file downloaded from Picovoice Console, and ${PROMPT} with a prompt string.

Instruction-tuned models (e.g., llama-3-8b-instruct, llama-2-7b-chat, and gemma-2b-it) have a specific chat template. You can either directly format the prompt or use a dialog helper:

dialog = pllm.get_dialog()
dialog.add_human_request(prompt)

res = pllm.generate(prompt=dialog.prompt())
dialog.add_llm_response(res.completion)
print(res.completion)

Finally, when done, be sure to release the resources explicitly:

pllm.release()

Demos

picollmdemo provides command-line utilities for LLM completion and chat using picoLLM.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picollm-1.0.0.tar.gz (9.0 MB view details)

Uploaded Source

Built Distribution

picollm-1.0.0-py3-none-any.whl (9.0 MB view details)

Uploaded Python 3

File details

Details for the file picollm-1.0.0.tar.gz.

File metadata

  • Download URL: picollm-1.0.0.tar.gz
  • Upload date:
  • Size: 9.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for picollm-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d14789f8640a933d6aa0452879c3cb2ba5e3cd0bca1561b6ffa0cfef256e380c
MD5 82d2045ced88ae0f40d8cd5f73279fdc
BLAKE2b-256 d1d900a2f83f022ad391e041957721b34ec20e1b12085ba15cdaed98af4be93e

See more details on using hashes here.

File details

Details for the file picollm-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: picollm-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for picollm-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a24ce487a20aca7af8c3292d6bfba604e68df91e0e9bef03caf0802d9f337b75
MD5 cc0b3788e2cd5224a4d107176279219c
BLAKE2b-256 2b4267ee8530ca1f7fecfe7fa3a43eea3cc6b6f78426ed80c629efeb5ff730b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page