Skip to main content

The fastest inference framework to run BitNet models on CPUs

Project description

Trillim

High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).

Quick Start

Prerequisites

  • Python 3.12+ with uv - can use pip or any package manager

Install and run

# Install trillim
uv add trillim

# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ

# Chat
uv run trillim chat Trillim/BitNet-TRNQ

Quantize your own model

If you have a HuggingFace BitNet model with safetensors weights:

# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model

# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>

API Server

Trillim includes an OpenAI-compatible API server:

# Start the server
uv run trillim serve <model-dir>

# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice

Endpoints:

  • POST /v1/chat/completions — chat completions (streaming supported)
  • POST /v1/completions — text completions
  • GET /v1/models — list loaded models
  • POST /v1/models/load — hot-swap models and LoRA adapters at runtime
  • POST /v1/audio/transcriptions — speech-to-text (with --voice)
  • POST /v1/audio/speech — text-to-speech (with --voice)
  • GET /v1/voices — list available TTS voices
  • POST /v1/voices — register a custom voice from audio (need to accept pocket-tts' terms on huggingface)

Python SDK

The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:

from trillim import Server, LLM, TTS, Whisper

# Inference only
Server(LLM("models/BitNet")).run()

# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()

# TTS only
Server(TTS()).run()

LoRA Adapters

Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:

# Ensure qmodel.lora is in the directory 
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora

Supported Architectures

  • BitnetForCausalLM — BitNet with ternary weights and ReLU² activation
  • LlamaForCausalLM — Llama-style with SiLU activation

Platform Support

Platform Status
x86_64 (AVX2) Supported
ARM64 (NEON) Supported

Thread count is auto-detected as num_cores - 2. Override by passing a --threads N CLI argument.

License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See LICENSE for full terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

trillim-0.1.3-py3-none-win_arm64.whl (1.5 MB view details)

Uploaded Python 3Windows ARM64

trillim-0.1.3-py3-none-win_amd64.whl (1.6 MB view details)

Uploaded Python 3Windows x86-64

trillim-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (388.2 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

trillim-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (398.2 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

trillim-0.1.3-py3-none-macosx_11_0_x86_64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ x86-64

trillim-0.1.3-py3-none-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file trillim-0.1.3-py3-none-win_arm64.whl.

File metadata

  • Download URL: trillim-0.1.3-py3-none-win_arm64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.3-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 cf16f3ff2d839a37ec952d505336b119ac848906811ce51fb0e252071b663bc0
MD5 b93a91e214a6bb4d47be8976906370dc
BLAKE2b-256 08d409f1ceeda687f2bf21691c2b4565473fbca8b929381f329e87b10d2275d3

See more details on using hashes here.

File details

Details for the file trillim-0.1.3-py3-none-win_amd64.whl.

File metadata

  • Download URL: trillim-0.1.3-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.1.3-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 33e381ea2fb91497c8806a11e6bc8e5eab746fb82a8abee79e13321aa5e4ed23
MD5 b22a318c4e70ae668b0064519c41c356
BLAKE2b-256 3f33bff1c7406765014cbe791715e5c66f98e4dd00caf430276b6e666e925e38

See more details on using hashes here.

File details

Details for the file trillim-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7e294b6dafead8510348a76c0b3ee81e4951bc04123f4a1efb9b031c722793bf
MD5 b3035ff61e31a06cdff78a6c383182b8
BLAKE2b-256 4e897101dacb973d7ebcdcc19f271607ecf704ebb7400f79938d92c52effdf49

See more details on using hashes here.

File details

Details for the file trillim-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for trillim-0.1.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0f2be629f43ccd70a757d5889455166ce1012f4c02c8bf3f1209c625c0d50f5e
MD5 65aebb3bb395c3a5f0fbd052a4bb662e
BLAKE2b-256 40709be30af7a72d8cfc7696933bc990710569bf11a3ffbac72aa7ce0399f9fb

See more details on using hashes here.

File details

Details for the file trillim-0.1.3-py3-none-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for trillim-0.1.3-py3-none-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 16e4469ab61b610230f47e1b74856b594962a06d21e8d18070da8d5d24419cb9
MD5 5bed585ee25aed73100a4a7dc209d0b1
BLAKE2b-256 d1c73891d45f9128c6110d83ead3e98b0985e6729542ee413cdc5cacc21b88f4

See more details on using hashes here.

File details

Details for the file trillim-0.1.3-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for trillim-0.1.3-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0b2265e040654590e13be2416514bc3bf60a0f2b91e35d2587e3c260b969e458
MD5 15cbc2f1ce0834df48df0ab785b8477e
BLAKE2b-256 7a89a3f4c0a5a750ceed6d420bd7b3ada03f0ff9c0346ea118ca3ff501a57839

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page