The fastest inference framework to run BitNet models on CPUs
Project description
Trillim
High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).
Quick Start
Prerequisites
- Python 3.12+ with
uv- can use pip or any package manager
Install and run
# Install trillim
uv add trillim
# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ
# Chat
uv run trillim chat Trillim/BitNet-TRNQ
Quantize your own model
If you have a HuggingFace BitNet model with safetensors weights:
# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model
# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>
API Server
Trillim includes an OpenAI-compatible API server:
# Start the server
uv run trillim serve <model-dir>
# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice
Endpoints:
POST /v1/chat/completions— chat completions (streaming supported)POST /v1/completions— text completionsGET /v1/models— list loaded modelsPOST /v1/models/load— hot-swap models and LoRA adapters at runtimePOST /v1/audio/transcriptions— speech-to-text (with--voice)POST /v1/audio/speech— text-to-speech (with--voice)GET /v1/voices— list available TTS voicesPOST /v1/voices— register a custom voice from audio (need to accept pocket-tts' terms on huggingface)
Python SDK
The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:
from trillim import Server, LLM, TTS, Whisper
# Inference only
Server(LLM("models/BitNet")).run()
# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()
# TTS only
Server(TTS()).run()
LoRA Adapters
Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:
# Ensure qmodel.lora is in the directory
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora
Supported Architectures
BitnetForCausalLM— BitNet with ternary weights and ReLU² activationLlamaForCausalLM— Llama-style with SiLU activation
Platform Support
| Platform | Status |
|---|---|
| x86_64 (AVX2) | Supported |
| ARM64 (NEON) | Supported |
Thread count is auto-detected as num_cores - 2. Override by passing a --threads N CLI argument.
License
The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See LICENSE for full terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trillim-0.1.1-py3-none-win_arm64.whl.
File metadata
- Download URL: trillim-0.1.1-py3-none-win_arm64.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3, Windows ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
249e159b293ec76d7f36568c989df3d039c2b719f3b2eb34205c2bc755181312
|
|
| MD5 |
84adf6c386905c17e71cd36517d9f304
|
|
| BLAKE2b-256 |
2527950438538bcd4334d58148b1cea2ea11e2048c82fc993381b026379fc949
|
File details
Details for the file trillim-0.1.1-py3-none-win_amd64.whl.
File metadata
- Download URL: trillim-0.1.1-py3-none-win_amd64.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3b323633ce767cdd433b132fd4c9ccffea21323f7d0821fe8b545c2bc86808b
|
|
| MD5 |
0dc8c65602cb77b48925644048ccef88
|
|
| BLAKE2b-256 |
7614e085ef1333013a116b241605b9d5a6dde36052228629d8d6794c9b19b800
|
File details
Details for the file trillim-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: trillim-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 388.6 kB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1508f552ad91c6ce81256fe9912b196ee8a2a3f98a249d39a72110b43f7b4566
|
|
| MD5 |
0b140dd473096288a2cb2e2492dc63f8
|
|
| BLAKE2b-256 |
a28d0c72da21b4666bb6ed6018525c0a1981e72db9126a419a71beecc89d0023
|
File details
Details for the file trillim-0.1.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: trillim-0.1.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 398.7 kB
- Tags: Python 3, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da06a9947152c78633bbfdb8ebcedbd89eb22703adf67f4e068fee2aa45ceb36
|
|
| MD5 |
59d9abc04263e6c40867781827b1ce9a
|
|
| BLAKE2b-256 |
5c35abe7a9bbff689487642f4db2d905bbbbd95fcfd113f3b58607194aa2db04
|
File details
Details for the file trillim-0.1.1-py3-none-macosx_11_0_x86_64.whl.
File metadata
- Download URL: trillim-0.1.1-py3-none-macosx_11_0_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da5a0b2eb904e19fc588bb70c020ef83724fc43f60d8c8f61f5f61d41027ca97
|
|
| MD5 |
2c45f4ccd6cfddc262daa48ecbb1c2b9
|
|
| BLAKE2b-256 |
368232a8635517bbda7f7dc81f70f13f040ff40ea9fde459e3a6a77167ec522f
|
File details
Details for the file trillim-0.1.1-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: trillim-0.1.1-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68e336813c5bd5596cda32840779e8fffb7c0a5b726609546acfaa17d28ae3e5
|
|
| MD5 |
5255f01e634e84a6152ee8fad6e48de0
|
|
| BLAKE2b-256 |
b4534fbf8021a137dc8acfdc1c7cd9ee7dd5bc7658f791c53dc8c45289950ef5
|