Skip to main content

The fastest inference framework to run BitNet models on CPUs

Project description

Trillim

High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).

Quick Start

Prerequisites

  • Python 3.12+ with uv - can use pip or any package manager

Install and run

# Install trillim
uv add trillim

# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ

# Chat
uv run trillim chat Trillim/BitNet-TRNQ

Quantize your own model

If you have a HuggingFace BitNet model with safetensors weights:

# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model

# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>

API Server

Trillim includes an OpenAI-compatible API server:

# Start the server
uv run trillim serve <model-dir>

# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice

Endpoints:

  • POST /v1/chat/completions — chat completions (streaming supported)
  • POST /v1/completions — text completions
  • GET /v1/models — list loaded models
  • POST /v1/models/load — hot-swap models and LoRA adapters at runtime
  • POST /v1/audio/transcriptions — speech-to-text (with --voice)
  • POST /v1/audio/speech — text-to-speech (with --voice)
  • GET /v1/voices — list available TTS voices
  • POST /v1/voices — register a custom voice from audio (need to accept pocket-tts' terms on huggingface)

Python SDK

The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:

from trillim import Server, LLM, TTS, Whisper

# Inference only
Server(LLM("models/BitNet")).run()

# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()

# TTS only
Server(TTS()).run()

LoRA Adapters

Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:

# Ensure qmodel.lora is in the directory 
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora

Supported Architectures

  • BitnetForCausalLM — BitNet with ternary weights and ReLU² activation
  • LlamaForCausalLM — Llama-style with SiLU activation

Platform Support

Platform Status
x86_64 (AVX2) Supported
ARM64 (NEON) Supported

Thread count is auto-detected as num_cores - 2. Override by passing a --threads N CLI argument.

License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See LICENSE for full terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trillim-0.2.5-py3-none-win_arm64.whl (1.5 MB view details)

Uploaded Python 3Windows ARM64

trillim-0.2.5-py3-none-win_amd64.whl (1.7 MB view details)

Uploaded Python 3Windows x86-64

trillim-0.2.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.2 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

trillim-0.2.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (493.8 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

trillim-0.2.5-py3-none-macosx_11_0_x86_64.whl (1.3 MB view details)

Uploaded Python 3macOS 11.0+ x86-64

trillim-0.2.5-py3-none-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file trillim-0.2.5-py3-none-win_arm64.whl.

File metadata

  • Download URL: trillim-0.2.5-py3-none-win_arm64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.2.5-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 b225b3faa7a58ce6f70de36d134cf0b7856626849b68a25f70c3a285331dcfaf
MD5 3148d7be02a45f4c7d1df04f2c52feb1
BLAKE2b-256 31504058081d3a2453473da9d6007c426cd1014336f6719fc12d43e66210e35f

See more details on using hashes here.

File details

Details for the file trillim-0.2.5-py3-none-win_amd64.whl.

File metadata

  • Download URL: trillim-0.2.5-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.2.5-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 76fdfb711847d50c4d80c390df8f7b222bead6f14e57c2f386d64379ebb1fedc
MD5 3a8c48ff6fc391ba67d55edf7fed1fa8
BLAKE2b-256 b574ed4ca467922eede3c576224011b9993b384dcdc01d6616d2b07ecf194a3d

See more details on using hashes here.

File details

Details for the file trillim-0.2.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.2.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 208d7d596cddb13842e83fda209ca984b32541a868b17497ce0b5b312ea18463
MD5 e2af8f16a388630ffc52fe67f0d824b4
BLAKE2b-256 cd4c24cbe829188d9f8c5ada34aa6582ed4e549162a3e9150fa22bb626488331

See more details on using hashes here.

File details

Details for the file trillim-0.2.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for trillim-0.2.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 af0c2346c6fd07e8b699614c997c2957bea94b5b71d71a36bfa9265a1b1dae34
MD5 ceed3b9bb667dc406f42bbb88f759118
BLAKE2b-256 1c31805880ee6c2a92855e15814b6c4bc6be681b387a6b9c193445dff67a45ae

See more details on using hashes here.

File details

Details for the file trillim-0.2.5-py3-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.2.5-py3-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 9cad7a09824eee12df81422e09725a50275f1a60cefca2f6ebc8a2547878ff94
MD5 2a8c06d7e1262a5e380738f92ced4356
BLAKE2b-256 abe8546586ad6274f580dc73508082da54bda1a67189453d541bd8d3cc481390

See more details on using hashes here.

File details

Details for the file trillim-0.2.5-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trillim-0.2.5-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dc05c50ef104c1b357686c2c3aee978fd7720222f897b0d6e89432a2d39441ae
MD5 56257d008737d7835a71a8dc4f7ef586
BLAKE2b-256 099c4b0e823a36c553c0fec45a7290c99dcaaf1a976cd1a31f2e2befe6293697

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page