The fastest inference framework to run BitNet models on CPUs
Project description
Trillim
Trillim is the platform for everything local AI. DarkNet is the CPU inference engine powering Trillim.
Install
- Python 3.12+ required
- Linux also requires glibc 2.27+
- uv is the recommended installer
Platform guides:
If you installed with uv, prefix the CLI examples below with uv run.
Common Workflows
Pull a Model
trillim list
trillim pull Trillim/BitNet-TRNQ
Chat in the Terminal
trillim chat Trillim/BitNet-TRNQ
trillim chat keeps multi-turn history, preserves exact token continuity for prior turns, and reuses the KV cache whenever the next turn can safely append to that exact prompt state. Use /new to reset the conversation or q to quit.
Search-Augmented Chat
Use the search harness with a search-tuned model:
trillim chat Trillim/BitNet-Search-TRNQ --harness search
DuckDuckGo (ddgs) is the default provider. To use Brave:
export SEARCH_API_KEY=<your_api_key>
trillim chat Trillim/BitNet-Search-TRNQ --harness search --search-provider brave
Serve an OpenAI-Compatible API
Start the server:
trillim serve Trillim/BitNet-TRNQ
Main endpoints:
POST /v1/chat/completionsPOST /v1/completionsGET /v1/modelsPOST /v1/models/load
Example with the OpenAI Python client:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
model="BitNet-TRNQ",
messages=[{"role": "user", "content": "Hello!"}],
)
To switch a running server to the search harness, call POST /v1/models/load with "harness": "search" and optional "search_provider": "ddgs" | "brave".
Quantize a Model or Adapter
If you have a HuggingFace model with safetensors weights (currently only supports BitNet models):
# Quantize model weights -> qmodel.tensors + rope.cache
trillim quantize <path-to-model> --model
# Extract a PEFT LoRA adapter -> qmodel.lora
trillim quantize <path-to-model> --adapter <path-to-adapter>
Use a LoRA Adapter
# Quantize a PEFT adapter into Trillim's format
trillim quantize <path-to-base-model> --adapter <path-to-adapter>
# Run the base model with the adapter
trillim chat Trillim/BitNet-TRNQ --lora <adapter-dir>
# Or pull a pre-quantized adapter and use it by ID
trillim pull Trillim/BitNet-GenZ-LoRA-TRNQ
trillim chat Trillim/BitNet-TRNQ --lora Trillim/BitNet-GenZ-LoRA-TRNQ
The same adapter settings can be changed at runtime through POST /v1/models/load.
Runtime Quantization
Runtime quantization reduces memory use for selected layers during inference:
--lora-quant <type>for LoRA layers:none,bf16,int8,q4_0,q5_0,q6_k,q8_0--unembed-quant <type>for the unembedding layer:int8,q4_0,q5_0,q6_k,q8_0
trillim chat Trillim/BitNet-TRNQ --lora <adapter-dir> --lora-quant int8
trillim chat Trillim/BitNet-TRNQ --unembed-quant q4_0
trillim serve Trillim/BitNet-TRNQ --lora-quant q8_0 --unembed-quant q4_0
Voice Support
Install the optional voice extra before using speech endpoints:
uv add "trillim[voice]"
Or with pip:
pip install "trillim[voice]"
Then start the server with:
trillim serve Trillim/BitNet-TRNQ --voice
Voice endpoints:
POST /v1/audio/transcriptionsPOST /v1/audio/speechGET /v1/voicesPOST /v1/voices
Predefined voices are alba, marius, javert, jean, fantine, cosette, eponine, and azelma.
For custom voice registration through POST /v1/voices, accept the terms for kyutai/pocket-tts, create a HuggingFace token with Read access, and run:
hf auth login
Custom voice uploads through POST /v1/voices are limited to 8 MB per file.
That setup is only required once. Predefined voices work without it.
Performance Highlights
Benchmark takeaways for DarkNet on consumer CPUs:
- Prefill throughput improvements are most visible when
num_threads >= 4. - Decode throughput is broadly comparable to bitnet.cpp on average, while DarkNet reaches higher peaks.
- Results are directional and depend on thermal behavior, boost policy, and memory bandwidth.
Prefill example:
Decode example:
Supported Architectures
BitnetForCausalLMfor ternary BitNet models with ReLU² activationLlamaForCausalLMfor Llama-style models with SiLU activation
Platform Support
| Platform | Status |
|---|---|
| x86_64 (AVX2) | Supported |
| ARM64 (NEON) | Supported |
Thread count defaults to num_cores - 2. Override it with --threads N.
Documentation
- What Is Trillim?
- Install: macOS, Linux, Windows
- CLI Reference
- Interactive Chat
- Python Components
- API Server
- Benchmarks
License
The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary. You may use them as part of Trillim, but may not reverse-engineer or redistribute them separately. See LICENSE for the full terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trillim-0.6.0-py3-none-win_arm64.whl.
File metadata
- Download URL: trillim-0.6.0-py3-none-win_arm64.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3, Windows ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
875422246a94bb4401799eb0072432efb87147f5bb05533eec5400f8db330a0b
|
|
| MD5 |
88ca70269e9f60eaa5353319b830fad3
|
|
| BLAKE2b-256 |
75a7c66879789260d9506e9b15e727cebcf2a13597ba02880f7f4335f594a3a3
|
File details
Details for the file trillim-0.6.0-py3-none-win_amd64.whl.
File metadata
- Download URL: trillim-0.6.0-py3-none-win_amd64.whl
- Upload date:
- Size: 2.5 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b3e956ee6f1c935a87e76d4c62abc6b70ffcedb1d11252bba34a13c4e8169cc
|
|
| MD5 |
34765d21c991de16c4a5f319a1fd08da
|
|
| BLAKE2b-256 |
c888ae5c46c4b1807fc2578dcac62511f154537dd90c237ebf6c656c13aca1b8
|
File details
Details for the file trillim-0.6.0-py3-none-manylinux_2_27_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: trillim-0.6.0-py3-none-manylinux_2_27_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 12.5 MB
- Tags: Python 3, manylinux: glibc 2.27+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2eff2fd04fd004699d793278dcdeb6171927bb48047fdf7a7c163bce5557a28f
|
|
| MD5 |
2f9a34cffa4feac49b7aa5e0607338b0
|
|
| BLAKE2b-256 |
7d842499986f0f090cb0a418a192b1e2e8f348fe411cb2618db0611a4ba3750a
|
File details
Details for the file trillim-0.6.0-py3-none-manylinux_2_27_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: trillim-0.6.0-py3-none-manylinux_2_27_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 12.8 MB
- Tags: Python 3, manylinux: glibc 2.27+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddca33a97d2eee1e54847cce906d047ab2717edd8b2c86e2738b985ddaf8162d
|
|
| MD5 |
3de5c4d2b41e018551085a536c736a07
|
|
| BLAKE2b-256 |
cb0a0d96cfcf6363ef90475c51eb70a63a58ce342903509a786337ef604ccdcb
|
File details
Details for the file trillim-0.6.0-py3-none-macosx_11_0_x86_64.whl.
File metadata
- Download URL: trillim-0.6.0-py3-none-macosx_11_0_x86_64.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6064a381aa7b627d72c606c0034bec0c3f083a35af512d61c8b8bb0dcfd0e24
|
|
| MD5 |
c75064cba73d25e7d6ffef5ec6e597c1
|
|
| BLAKE2b-256 |
396f8b9346b71eb37191703b733ac9684e38b52f2781e27298efe5f96340a91d
|
File details
Details for the file trillim-0.6.0-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: trillim-0.6.0-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.3 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ba505307a9f08d5f0b5810ade9373f2050e4a860595b3a7470eeb83f1be9989
|
|
| MD5 |
c8c4ac35f8f6fe4a89818f0ffb118300
|
|
| BLAKE2b-256 |
0c47ad4b97797f5b30376b822b62c9f72d40779ff8d72822fa488bad99e0c704
|