Skip to main content

The fastest inference framework to run BitNet models on CPUs

Project description

Trillim

High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).

Quick Start

Prerequisites

  • Python 3.12+ with uv - can use pip or any package manager

Install and run

# Install trillim
uv add trillim

# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ

# Chat
uv run trillim chat Trillim/BitNet-TRNQ

Quantize your own model

If you have a HuggingFace BitNet model with safetensors weights:

# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model

# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>

API Server

Trillim includes an OpenAI-compatible API server:

# Start the server
uv run trillim serve <model-dir>

# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice

Endpoints:

  • POST /v1/chat/completions — chat completions (streaming supported)
  • POST /v1/completions — text completions
  • GET /v1/models — list loaded models
  • POST /v1/models/load — hot-swap models and LoRA adapters at runtime
  • POST /v1/audio/transcriptions — speech-to-text (with --voice)
  • POST /v1/audio/speech — text-to-speech (with --voice)
  • GET /v1/voices — list available TTS voices
  • POST /v1/voices — register a custom voice from audio (need to accept pocket-tts' terms on huggingface)

Python SDK

The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:

from trillim import Server, LLM, TTS, Whisper

# Inference only
Server(LLM("models/BitNet")).run()

# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()

# TTS only
Server(TTS()).run()

LoRA Adapters

Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:

# Ensure qmodel.lora is in the directory 
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora

Supported Architectures

  • BitnetForCausalLM — BitNet with ternary weights and ReLU² activation
  • LlamaForCausalLM — Llama-style with SiLU activation

Platform Support

Platform Status
x86_64 (AVX2) Supported
ARM64 (NEON) Supported

Thread count is auto-detected as num_cores - 2. Override by passing a --threads N CLI argument.

License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See LICENSE for full terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trillim-0.1.0-py3-none-win_arm64.whl (1.5 MB view details)

Uploaded Python 3Windows ARM64

trillim-0.1.0-py3-none-win_amd64.whl (1.6 MB view details)

Uploaded Python 3Windows x86-64

trillim-0.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (388.7 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

trillim-0.1.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (398.7 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

trillim-0.1.0-py3-none-macosx_11_0_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ x86-64

trillim-0.1.0-py3-none-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file trillim-0.1.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: trillim-0.1.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 3e0e39eb5d9ddb116bca469a530a8d6a9a36641a8dc62a957f8874cb6c6b36e3
MD5 a47fbaf8153d9b5b85633b114aa9accc
BLAKE2b-256 e080548648b0723b5de60160cba25a7cc7a04953f13b322f7572a717c008757b

See more details on using hashes here.

File details

Details for the file trillim-0.1.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: trillim-0.1.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 c86ac52e028c300687e2d1c85b1342dfa3bc6fc74b4b3fa92038d8c5601c3f66
MD5 f2bc8a0df8550a6cb15db4d4a4f76d4f
BLAKE2b-256 04f1b1e27c1dec44ad7171f331198b3e030b8d8a1ec0111de8bc98c718531a36

See more details on using hashes here.

File details

Details for the file trillim-0.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2edb2c8d826f1d5ecfefdc7532ab13ad65fd16f928466541b92e5feac3c845fe
MD5 b332d1c5ffcab5b7db54c8b39c9b38f4
BLAKE2b-256 6e0c0a7b0579c4b8728d3372dcbd55613e088c21ae3e713a0144a13ab5f243bf

See more details on using hashes here.

File details

Details for the file trillim-0.1.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for trillim-0.1.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c31853badab9975ce6408c11b78d9b05f875eda95a392a4125e01d058a21aa5a
MD5 d51a541486f275a3df6dbd05e62a1742
BLAKE2b-256 e36b52f9b425d102c20f24c01ab17bf662d0ddb7e1d836bd870fffa087ee8757

See more details on using hashes here.

File details

Details for the file trillim-0.1.0-py3-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.0-py3-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 d1db8a081ddbe3381fde9f5bf8d3f334afe114aa692e946b009e89d8cae4a85d
MD5 84d194625d7ab605f78b72c80421043c
BLAKE2b-256 7c7a5a9feec4f0d6d3f6e21f2ecbb5a9df0538bb8bfe21628747bcd0cd70da6f

See more details on using hashes here.

File details

Details for the file trillim-0.1.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trillim-0.1.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5d7bf6622ffac63526de553bcfa8947a10feb28a2a84b53f29473d5da26bac06
MD5 e6f7253bed7039035995a7236c899a7f
BLAKE2b-256 d701cfbe8fdeade77a88df56bbf90a6c3865aa0b27620f7c6386ba9105a3d46f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page