Skip to main content

The fastest inference framework to run BitNet models on CPUs

Project description

Trillim

High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).

Quick Start

Prerequisites

  • Python 3.12+ with uv - can use pip or any package manager

Install and run

# Install trillim
uv add trillim

# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ

# Chat
uv run trillim chat Trillim/BitNet-TRNQ

Quantize your own model

If you have a HuggingFace BitNet model with safetensors weights:

# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model

# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>

API Server

Trillim includes an OpenAI-compatible API server:

# Start the server
uv run trillim serve <model-dir>

# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice

Endpoints:

  • POST /v1/chat/completions — chat completions (streaming supported)
  • POST /v1/completions — text completions
  • GET /v1/models — list loaded models
  • POST /v1/models/load — hot-swap models and LoRA adapters at runtime
  • POST /v1/audio/transcriptions — speech-to-text (with --voice)
  • POST /v1/audio/speech — text-to-speech (with --voice)
  • GET /v1/voices — list available TTS voices
  • POST /v1/voices — register a custom voice from audio (need to accept pocket-tts' terms on huggingface)

Python SDK

The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:

from trillim import Server, LLM, TTS, Whisper

# Inference only
Server(LLM("models/BitNet")).run()

# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()

# TTS only
Server(TTS()).run()

LoRA Adapters

Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:

# Ensure qmodel.lora is in the directory 
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora

Supported Architectures

  • BitnetForCausalLM — BitNet with ternary weights and ReLU² activation
  • LlamaForCausalLM — Llama-style with SiLU activation

Platform Support

Platform Status
x86_64 (AVX2) Supported
ARM64 (NEON) Supported

Thread count is auto-detected as num_cores - 2. Override by passing a --threads N CLI argument.

License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See LICENSE for full terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trillim-0.1.4-py3-none-win_arm64.whl (1.5 MB view details)

Uploaded Python 3Windows ARM64

trillim-0.1.4-py3-none-win_amd64.whl (1.6 MB view details)

Uploaded Python 3Windows x86-64

trillim-0.1.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (388.6 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

trillim-0.1.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (398.6 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

trillim-0.1.4-py3-none-macosx_11_0_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ x86-64

trillim-0.1.4-py3-none-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file trillim-0.1.4-py3-none-win_arm64.whl.

File metadata

  • Download URL: trillim-0.1.4-py3-none-win_arm64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.4-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 41dd97645149d2070f7ebd7b1193ad153aaa585a148f96d26818b51ffd3de335
MD5 1ff834ff883b16e061524649866c26cd
BLAKE2b-256 66ab596b7be725dec3c1ac0032d3d23d6ac34e475a6cceabac683bf2468f5954

See more details on using hashes here.

File details

Details for the file trillim-0.1.4-py3-none-win_amd64.whl.

File metadata

  • Download URL: trillim-0.1.4-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.4-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 e18b722597e096d62aebf8d9553e9907da58643cd54b1c40e02f40c5f5096af0
MD5 9f7e47fee311501c726f515bee03f494
BLAKE2b-256 d47a8cdca647328bd567054e3dc05048806daa5e80f96abb20639596410c5462

See more details on using hashes here.

File details

Details for the file trillim-0.1.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 024d12705e1b6391d5c70fb210814e961f2499ac57dd7b8f381402bfd439b373
MD5 0eff049a8dc85db9afa5a293cc88a2d5
BLAKE2b-256 5307f8098b5cfeef732a776dc11cf22eae43c057e4b2d974cc23d28ef0a9bf93

See more details on using hashes here.

File details

Details for the file trillim-0.1.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for trillim-0.1.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 aa2c5aea0960a8e1922f0f42d45481976c1c1005f0938953942548303a7fd49f
MD5 92ab56c874a76f6fb1291de474db7d75
BLAKE2b-256 bd7e301418414806c0ffd025ae46bb9376f4b1e743511854e47f63eaf97aa6b2

See more details on using hashes here.

File details

Details for the file trillim-0.1.4-py3-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.4-py3-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 9fb1ca5ea46d29c1e58373cb9f0b2c0b32f23fa02e6927ff3597773aaba785ff
MD5 fbf84e24c0d56764b1b5113aca422fcc
BLAKE2b-256 3473a5959210ae20a16cfb93c4e1649ec587d50d50813677e55cb91cab1ed81f

See more details on using hashes here.

File details

Details for the file trillim-0.1.4-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trillim-0.1.4-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 973447bb5c5f182e499b71b6c267c5177fce265c7e83248e8215615d94f57c94
MD5 55cff4db9f3b8a087e17ad5ca75cfc77
BLAKE2b-256 073094ec8a2247aff3e52e91cc8622601ea4fdcbc82511c113cc4ca0ed3a5cee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page