Skip to main content

The fastest inference framework to run BitNet models on CPUs

Project description

Trillim

High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).

Quick Start

Prerequisites

  • Python 3.12+ with uv - can use pip or any package manager

Install and run

# Install trillim
uv add trillim

# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ

# Chat
uv run trillim chat Trillim/BitNet-TRNQ

Quantize your own model

If you have a HuggingFace BitNet model with safetensors weights:

# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model

# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>

API Server

Trillim includes an OpenAI-compatible API server:

# Start the server
uv run trillim serve <model-dir>

# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice

Endpoints:

  • POST /v1/chat/completions — chat completions (streaming supported)
  • POST /v1/completions — text completions
  • GET /v1/models — list loaded models
  • POST /v1/models/load — hot-swap models and LoRA adapters at runtime
  • POST /v1/audio/transcriptions — speech-to-text (with --voice)
  • POST /v1/audio/speech — text-to-speech (with --voice)
  • GET /v1/voices — list available TTS voices
  • POST /v1/voices — register a custom voice from audio (need to accept pocket-tts' terms on huggingface)

Python SDK

The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:

from trillim import Server, LLM, TTS, Whisper

# Inference only
Server(LLM("models/BitNet")).run()

# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()

# TTS only
Server(TTS()).run()

LoRA Adapters

Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:

# Ensure qmodel.lora is in the directory 
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora

Supported Architectures

  • BitnetForCausalLM — BitNet with ternary weights and ReLU² activation
  • LlamaForCausalLM — Llama-style with SiLU activation

Platform Support

Platform Status
x86_64 (AVX2) Supported
ARM64 (NEON) Supported

Thread count is auto-detected as num_cores - 2. Override by passing a --threads N CLI argument.

License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See LICENSE for full terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trillim-0.1.5-py3-none-win_arm64.whl (1.5 MB view details)

Uploaded Python 3Windows ARM64

trillim-0.1.5-py3-none-win_amd64.whl (1.6 MB view details)

Uploaded Python 3Windows x86-64

trillim-0.1.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (389.0 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

trillim-0.1.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (399.0 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

trillim-0.1.5-py3-none-macosx_11_0_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ x86-64

trillim-0.1.5-py3-none-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file trillim-0.1.5-py3-none-win_arm64.whl.

File metadata

  • Download URL: trillim-0.1.5-py3-none-win_arm64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.5-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 9280b660dd9c7261e6395387808e0db8a10383979686786161f1e3ce3be0838b
MD5 f6980286b1db5a941b13e93d85dc2db5
BLAKE2b-256 75185a8c55d7f4282298c05e5729231e65929646ca77ecdb36d59e8a0d0bff7a

See more details on using hashes here.

File details

Details for the file trillim-0.1.5-py3-none-win_amd64.whl.

File metadata

  • Download URL: trillim-0.1.5-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.5-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 3fbf325ca3a514d884a046646defddc4a8128138564ca0f3836f0cc6a52ac578
MD5 60d58f012aac82c940dd695638a8c8af
BLAKE2b-256 b435576e5f331674c12e74e5ddcd92485998b92f8d4877eefd6e710d3c407979

See more details on using hashes here.

File details

Details for the file trillim-0.1.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 332e69f30763cc8a059ce0dadf723fe917dd170a7c164a1b84aee40c0b78dcc9
MD5 6cdb048dbebf543f4ac077d37ea8556f
BLAKE2b-256 108aeb4f49c13efd36ffbc695baca65d4b35f4720c88d0ec6792da0891ff648a

See more details on using hashes here.

File details

Details for the file trillim-0.1.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for trillim-0.1.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 df803dc95a7ebe9c7fc416b432335dc8f737d52eab612f3b9e288bd92bf77757
MD5 e37dd68728f25440f92ae6675358de0a
BLAKE2b-256 da75f8e37666e194a03a0fbd723779e638bd9fa08eadd8bebad8379b8f35f9da

See more details on using hashes here.

File details

Details for the file trillim-0.1.5-py3-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.5-py3-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 ed28a66e2d85470680e2b1f90da26a0b68a3a4c4de62577b4816b4a81037d9fc
MD5 7445c80ed513719bcc90289a6a71027e
BLAKE2b-256 5a7879ffbdbbff4df48ca72645f70ae8845d3ef0f6ed383ee37d95c534383c0d

See more details on using hashes here.

File details

Details for the file trillim-0.1.5-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trillim-0.1.5-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3af52707b51494790efa7523eb113c1aa904beba44be8cdf0d55a35c05911c05
MD5 87e5b77aa698e569ec98ac3eda343af5
BLAKE2b-256 46a250df3f4f2631af468359fd475fb219425c77db780f9061f26ac71fd3ba79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page