Skip to main content

The fastest inference framework to run BitNet models on CPUs

Project description

Trillim

High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).

Quick Start

Prerequisites

  • Python 3.12+ with uv - can use pip or any package manager

Install and run

# Install trillim
uv add trillim

# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ

# Chat
uv run trillim chat Trillim/BitNet-TRNQ

Quantize your own model

If you have a HuggingFace BitNet model with safetensors weights:

# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model

# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>

API Server

Trillim includes an OpenAI-compatible API server:

# Start the server
uv run trillim serve <model-dir>

# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice

Endpoints:

  • POST /v1/chat/completions — chat completions (streaming supported)
  • POST /v1/completions — text completions
  • GET /v1/models — list loaded models
  • POST /v1/models/load — hot-swap models and LoRA adapters at runtime
  • POST /v1/audio/transcriptions — speech-to-text (with --voice)
  • POST /v1/audio/speech — text-to-speech (with --voice)
  • GET /v1/voices — list available TTS voices
  • POST /v1/voices — register a custom voice from audio (need to accept pocket-tts' terms on huggingface)

Python SDK

The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:

from trillim import Server, LLM, TTS, Whisper

# Inference only
Server(LLM("models/BitNet")).run()

# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()

# TTS only
Server(TTS()).run()

LoRA Adapters

Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:

# Ensure qmodel.lora is in the directory 
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora

Supported Architectures

  • BitnetForCausalLM — BitNet with ternary weights and ReLU² activation
  • LlamaForCausalLM — Llama-style with SiLU activation

Platform Support

Platform Status
x86_64 (AVX2) Supported
ARM64 (NEON) Supported

Thread count is auto-detected as num_cores - 2. Override by passing a --threads N CLI argument.

License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See LICENSE for full terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trillim-0.1.2-py3-none-win_arm64.whl (1.5 MB view details)

Uploaded Python 3Windows ARM64

trillim-0.1.2-py3-none-win_amd64.whl (1.6 MB view details)

Uploaded Python 3Windows x86-64

trillim-0.1.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (388.7 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

trillim-0.1.2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (398.7 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

trillim-0.1.2-py3-none-macosx_11_0_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ x86-64

trillim-0.1.2-py3-none-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file trillim-0.1.2-py3-none-win_arm64.whl.

File metadata

  • Download URL: trillim-0.1.2-py3-none-win_arm64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.2-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 c20c83a852e176227f7255cb078ac8ef3e2634ebf9088e6df0b9a5ad0f60cacc
MD5 99a45f7c8a66795b045edcef97ace92f
BLAKE2b-256 2bcd366cb7eb10a1490a2e53c24f8243bf2f0e4e76d95c6cd58ceb6bea30e155

See more details on using hashes here.

File details

Details for the file trillim-0.1.2-py3-none-win_amd64.whl.

File metadata

  • Download URL: trillim-0.1.2-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.2-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 1dfc3e968f053e1078c53f973edf717e2b96c61376220857694981738d755084
MD5 5762e73afc2725371bfc8471e7722982
BLAKE2b-256 101e8ba5db52e1c2748c54d0568abdc4f6edc2eac40429ba6adb6f6fda244742

See more details on using hashes here.

File details

Details for the file trillim-0.1.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1fab3542566b152573883fca94e05b373d3273d2250f4399fce5e837f6504e6d
MD5 da2cbacd47814ba18c2a768054aa8039
BLAKE2b-256 9f1fd5609d7fc62b8c7c1fc16350f287745090bbbd99f76a3434ccebb4efe031

See more details on using hashes here.

File details

Details for the file trillim-0.1.2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for trillim-0.1.2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b4649b2013f48e159dc602b76e370fb89bccf05bc8d67b55eadb61d6e98702c8
MD5 cc37edde048e9ae24524fb56b39df02a
BLAKE2b-256 e797383633c9b955af96747c53c37c4704410126fbd5b56d3f4e2aea77d58bbe

See more details on using hashes here.

File details

Details for the file trillim-0.1.2-py3-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.2-py3-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 998caad4ad35c8ba6db4e8f2cb689574a57e62ebb20c7b641d595b9da8e39b07
MD5 0c0d70a86d5d124040fb2d9690233233
BLAKE2b-256 0a93e1206caac33a631b3897ddb18c0c0be36b9c86b4ec5a00d01aceefa8eba5

See more details on using hashes here.

File details

Details for the file trillim-0.1.2-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trillim-0.1.2-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c8d6b5cd5a729aa0f34b454515897be1eb6bacb33af62fbed0c2596769208c81
MD5 82f4065777dbbd9714e4b31312f0f8fe
BLAKE2b-256 fff9aff252a6e78e823313c8d2848fbcec432a45c497595a79c55d84f11fc705

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page