Skip to main content

The fastest inference framework to run BitNet models on CPUs

Project description

Trillim

High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).

Quick Start

Prerequisites

  • Python 3.12+ with uv - can use pip or any package manager

Install and run

# Install trillim
uv add trillim

# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ

# Chat
uv run trillim chat Trillim/BitNet-TRNQ

Quantize your own model

If you have a HuggingFace BitNet model with safetensors weights:

# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model

# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>

API Server

Trillim includes an OpenAI-compatible API server:

# Start the server
uv run trillim serve <model-dir>

# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice

Endpoints:

  • POST /v1/chat/completions — chat completions (streaming supported)
  • POST /v1/completions — text completions
  • GET /v1/models — list loaded models
  • POST /v1/models/load — hot-swap models and LoRA adapters at runtime
  • POST /v1/audio/transcriptions — speech-to-text (with --voice)
  • POST /v1/audio/speech — text-to-speech (with --voice)
  • GET /v1/voices — list available TTS voices
  • POST /v1/voices — register a custom voice from audio (need to accept pocket-tts' terms on huggingface)

Python SDK

The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:

from trillim import Server, LLM, TTS, Whisper

# Inference only
Server(LLM("models/BitNet")).run()

# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()

# TTS only
Server(TTS()).run()

LoRA Adapters

Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:

# Ensure qmodel.lora is in the directory 
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora

Supported Architectures

  • BitnetForCausalLM — BitNet with ternary weights and ReLU² activation
  • LlamaForCausalLM — Llama-style with SiLU activation

Platform Support

Platform Status
x86_64 (AVX2) Supported
ARM64 (NEON) Supported

Thread count is auto-detected as num_cores - 2. Override by passing a --threads N CLI argument.

License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See LICENSE for full terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trillim-0.1.1-py3-none-win_arm64.whl (1.5 MB view details)

Uploaded Python 3Windows ARM64

trillim-0.1.1-py3-none-win_amd64.whl (1.6 MB view details)

Uploaded Python 3Windows x86-64

trillim-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (388.6 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

trillim-0.1.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (398.7 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

trillim-0.1.1-py3-none-macosx_11_0_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ x86-64

trillim-0.1.1-py3-none-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file trillim-0.1.1-py3-none-win_arm64.whl.

File metadata

  • Download URL: trillim-0.1.1-py3-none-win_arm64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.1-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 249e159b293ec76d7f36568c989df3d039c2b719f3b2eb34205c2bc755181312
MD5 84adf6c386905c17e71cd36517d9f304
BLAKE2b-256 2527950438538bcd4334d58148b1cea2ea11e2048c82fc993381b026379fc949

See more details on using hashes here.

File details

Details for the file trillim-0.1.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: trillim-0.1.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 a3b323633ce767cdd433b132fd4c9ccffea21323f7d0821fe8b545c2bc86808b
MD5 0dc8c65602cb77b48925644048ccef88
BLAKE2b-256 7614e085ef1333013a116b241605b9d5a6dde36052228629d8d6794c9b19b800

See more details on using hashes here.

File details

Details for the file trillim-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1508f552ad91c6ce81256fe9912b196ee8a2a3f98a249d39a72110b43f7b4566
MD5 0b140dd473096288a2cb2e2492dc63f8
BLAKE2b-256 a28d0c72da21b4666bb6ed6018525c0a1981e72db9126a419a71beecc89d0023

See more details on using hashes here.

File details

Details for the file trillim-0.1.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for trillim-0.1.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 da06a9947152c78633bbfdb8ebcedbd89eb22703adf67f4e068fee2aa45ceb36
MD5 59d9abc04263e6c40867781827b1ce9a
BLAKE2b-256 5c35abe7a9bbff689487642f4db2d905bbbbd95fcfd113f3b58607194aa2db04

See more details on using hashes here.

File details

Details for the file trillim-0.1.1-py3-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.1-py3-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 da5a0b2eb904e19fc588bb70c020ef83724fc43f60d8c8f61f5f61d41027ca97
MD5 2c45f4ccd6cfddc262daa48ecbb1c2b9
BLAKE2b-256 368232a8635517bbda7f7dc81f70f13f040ff40ea9fde459e3a6a77167ec522f

See more details on using hashes here.

File details

Details for the file trillim-0.1.1-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trillim-0.1.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 68e336813c5bd5596cda32840779e8fffb7c0a5b726609546acfaa17d28ae3e5
MD5 5255f01e634e84a6152ee8fad6e48de0
BLAKE2b-256 b4534fbf8021a137dc8acfdc1c7cd9ee7dd5bc7658f791c53dc8c45289950ef5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page