Skip to main content

picoLLM Inference Engine

Project description

picoLLM Inference Engine Python Binding

Made in Vancouver, Canada by Picovoice

picoLLM Inference Engine

picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language models. picoLLM Inference Engine is:

  • Accurate; picoLLM Compression improves GPTQ by significant margins
  • Private; LLM inference runs 100% locally.
  • Cross-Platform
  • Runs on CPU and GPU
  • Free for open-weight models

Compatibility

  • Python 3.9+
  • Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64, arm64), and Raspberry Pi (3, 4, 5).

Installation

pip3 install picollm

Models

picoLLM Inference Engine supports the following open-weight models. The models are on Picovoice Console.

  • DeepSeek-OCR-2
    • deepseek-ocr-2
  • EmbeddingGemma
    • embeddinggemma-300m
  • Gemma
    • gemma-2b
    • gemma-2b-it
    • gemma-7b
    • gemma-7b-it
  • Gemma3
    • gemma-3-270m
    • gemma-3-270m-it
  • Llama-2
    • llama-2-7b
    • llama-2-7b-chat
    • llama-2-13b
    • llama-2-13b-chat
    • llama-2-70b
    • llama-2-70b-chat
  • Llama-3
    • llama-3-8b
    • llama-3-8b-instruct
    • llama-3-70b
    • llama-3-70b-instruct
  • Llama-3.2
    • llama3.2-1b-instruct
    • llama3.2-3b-instruct
  • Mistral
    • mistral-7b-v0.1
    • mistral-7b-instruct-v0.1
    • mistral-7b-instruct-v0.2
  • Mixtral
    • mixtral-8x7b-v0.1
    • mixtral-8x7b-instruct-v0.1
  • Phi-2
    • phi2
  • Phi-3
    • phi3
  • Phi-3.5
    • phi3.5
  • Qwen3-VL
    • qwen3-vl-2b-it

AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100% offline and completely free for open-weight models. Everyone who signs up for Picovoice Console receives a unique AccessKey.

Usage

Text models

Create an instance of the engine and generate a prompt completion:

import picollm

pllm = picollm.create(
    access_key='${ACCESS_KEY}',
    model_path='${MODEL_PATH}')

res = pllm.generate(prompt='${PROMPT}')
print(res.completion)

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console, ${MODEL_PATH} with the path to a model file downloaded from Picovoice Console, and ${PROMPT} with a prompt string.

Instruction-tuned models (e.g., llama-3-8b-instruct, llama-2-7b-chat, and gemma-2b-it) have a specific chat template. You can either directly format the prompt or use a dialog helper:

dialog = pllm.get_dialog()
dialog.add_human_request(prompt)

res = pllm.generate(prompt=dialog.prompt())
dialog.add_llm_response(res.completion)
print(res.completion)

To interrupt completion generation before it has finished:

pllm.interrupt()

Finally, when done, be sure to release the resources explicitly:

pllm.release()

Vision models

To run a VLM such as qwen3-vl-2b-it:

res = pllm.generate_with_image(
    prompt='${PROMPT}',
    image_width=${IMAGE_NUM_PIXELS_WIDTH},
    image_height=${IMAGE_NUM_PIXELS_HEIGHT},
    image=${IMAGE_DATA});
print(res.completion)

Replace ${PROMPT} with a text prompt. For the image, you will need to get image height and width in number of pixels and the raw pixel values of the image in 8-bit, RGB format.

OCR models

To run an OCR model such as deepseek-ocr-2:

res = pllm.generate_ocr(
    image_width=${IMAGE_NUM_PIXELS_WIDTH},
    image_height=${IMAGE_NUM_PIXELS_HEIGHT},
    image=${IMAGE_DATA});
print(res.completion)

For the image, you will need to get image height and width in number of pixels and the raw pixel values of the image in 8-bit, RGB format.

Embedding models

To run an embedding model such as embeddinggemma-300m:

res = pllm.generate_embeddings(prompt='${PROMPT}');
for embedding in range(len(res)):
  print(embedding)

Replace ${PROMPT} with a text prompt that you want to generate embeddings for.

Demos

picollmdemo provides command-line utilities for LLM completion and chat using picoLLM.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picollm-2.1.1.tar.gz (12.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

picollm-2.1.1-py3-none-any.whl (12.2 MB view details)

Uploaded Python 3

File details

Details for the file picollm-2.1.1.tar.gz.

File metadata

  • Download URL: picollm-2.1.1.tar.gz
  • Upload date:
  • Size: 12.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for picollm-2.1.1.tar.gz
Algorithm Hash digest
SHA256 fb2bceac87e6a8f18b9b4f014b6b00c7e90c877191061907856966741f063d30
MD5 cf35290d2b0b4f59cb6dcdca1176bfa9
BLAKE2b-256 b32dc68aed425873a9286aacc1644b194f1013bd2081d753828f2cb6a307582a

See more details on using hashes here.

File details

Details for the file picollm-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: picollm-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for picollm-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 55bb8769916e12b732442bc1d1564286eb3c88266be636e0c9025a1aa3c6c418
MD5 7d36ca4e24596e100da3c2abaeb8642b
BLAKE2b-256 b6f104591c62dfca6b51ccf7771fdf099ec00d805fd0902f8e576f85e1dbdfdc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page