The fastest inference framework to run BitNet models on CPUs

These details have not been verified by PyPI

Project links

Project description

Trillim

Trillim is the platform for everything local AI. DarkNet is the CPU inference engine powering Trillim.

Install

Python 3.12+ required
Linux also requires glibc 2.27+
uv is the recommended installer

Platform guides:

If you installed with uv, prefix the CLI examples below with uv run.

Common Workflows

Pull a Model

trillim list
trillim pull Trillim/BitNet-TRNQ

Chat in the Terminal

trillim chat Trillim/BitNet-TRNQ

trillim chat keeps multi-turn history, preserves exact token continuity for prior turns, and reuses the KV cache whenever the next turn can safely append to that exact prompt state. Use /new to reset the conversation or q to quit.

Search-Augmented Chat

Use the search harness with a search-tuned model:

trillim chat Trillim/BitNet-Search-TRNQ --harness search

DuckDuckGo (ddgs) is the default provider. To use Brave:

export SEARCH_API_KEY=<your_api_key>
trillim chat Trillim/BitNet-Search-TRNQ --harness search --search-provider brave

Serve an OpenAI-Compatible API

Start the server:

trillim serve Trillim/BitNet-TRNQ

Main endpoints:

POST /v1/chat/completions
POST /v1/completions
GET /v1/models
POST /v1/models/load

Example with the OpenAI Python client:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
    model="BitNet-TRNQ",
    messages=[{"role": "user", "content": "Hello!"}],
)

To switch a running server to the search harness, call POST /v1/models/load with "harness": "search" and optional "search_provider": "ddgs" | "brave".

Quantize a Model or Adapter

If you have a HuggingFace model with safetensors weights (currently only supports BitNet models):

# Quantize model weights -> qmodel.tensors + rope.cache
trillim quantize <path-to-model> --model

# Extract a PEFT LoRA adapter -> qmodel.lora
trillim quantize <path-to-model> --adapter <path-to-adapter>

Use a LoRA Adapter

# Quantize a PEFT adapter into Trillim's format
trillim quantize <path-to-base-model> --adapter <path-to-adapter>

# Run the base model with the adapter
trillim chat Trillim/BitNet-TRNQ --lora <adapter-dir>

# Or pull a pre-quantized adapter and use it by ID
trillim pull Trillim/BitNet-GenZ-LoRA-TRNQ
trillim chat Trillim/BitNet-TRNQ --lora Trillim/BitNet-GenZ-LoRA-TRNQ

The same adapter settings can be changed at runtime through POST /v1/models/load.

Runtime Quantization

Runtime quantization reduces memory use for selected layers during inference:

--lora-quant <type> for LoRA layers: none, bf16, int8, q4_0, q5_0, q6_k, q8_0
--unembed-quant <type> for the unembedding layer: int8, q4_0, q5_0, q6_k, q8_0

trillim chat Trillim/BitNet-TRNQ --lora <adapter-dir> --lora-quant int8
trillim chat Trillim/BitNet-TRNQ --unembed-quant q4_0
trillim serve Trillim/BitNet-TRNQ --lora-quant q8_0 --unembed-quant q4_0

Voice Support

Install the optional voice extra before using speech endpoints:

uv add "trillim[voice]"

Or with pip:

pip install "trillim[voice]"

Then start the server with:

trillim serve Trillim/BitNet-TRNQ --voice

Voice endpoints:

POST /v1/audio/transcriptions
POST /v1/audio/speech
GET /v1/voices
POST /v1/voices

Predefined voices are alba, marius, javert, jean, fantine, cosette, eponine, and azelma.

For custom voice registration through POST /v1/voices, accept the terms for kyutai/pocket-tts, create a HuggingFace token with Read access, and run:

hf auth login

Custom voice uploads through POST /v1/voices are limited to 8 MB per file.

That setup is only required once. Predefined voices work without it.

Performance Highlights

Benchmark takeaways for DarkNet on consumer CPUs:

Prefill throughput improvements are most visible when num_threads >= 4.
Decode throughput is broadly comparable to bitnet.cpp on average, while DarkNet reaches higher peaks.
Results are directional and depend on thermal behavior, boost policy, and memory bandwidth.

Prefill example:

Prefill benchmark example

Decode example:

Decode benchmark example

Supported Architectures

BitnetForCausalLM for ternary BitNet models with ReLU² activation
LlamaForCausalLM for Llama-style models with SiLU activation

Platform Support

Platform	Status
x86_64 (AVX2)	Supported
ARM64 (NEON)	Supported

Thread count defaults to num_cores - 2. Override it with --threads N.

Documentation

License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (inference, trillim-quantize) bundled in the pip package are proprietary. You may use them as part of Trillim, but may not reverse-engineer or redistribute them separately. See LICENSE for the full terms.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.10.2

May 1, 2026

0.10.1

Apr 28, 2026

0.10.0

Apr 27, 2026

0.9.0

Apr 17, 2026

0.8.1

Apr 16, 2026

0.8.0

Apr 10, 2026

0.7.2

Mar 29, 2026

0.7.1

Mar 28, 2026

0.7.0

Mar 28, 2026

This version

0.6.0

Mar 16, 2026

0.5.3

Mar 3, 2026

0.5.2

Mar 3, 2026

0.5.1

Mar 3, 2026

0.5.0

Mar 3, 2026

0.2.6

Feb 27, 2026

0.2.5

Feb 24, 2026

0.1.5

Feb 18, 2026

0.1.4

Feb 18, 2026

0.1.3

Feb 17, 2026

0.1.2

Feb 17, 2026

0.1.1

Feb 17, 2026

0.1.0

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trillim-0.6.0-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Mar 16, 2026 Python 3Windows ARM64

trillim-0.6.0-py3-none-win_amd64.whl (2.5 MB view details)

Uploaded Mar 16, 2026 Python 3Windows x86-64

trillim-0.6.0-py3-none-manylinux_2_27_x86_64.manylinux2014_x86_64.whl (12.5 MB view details)

Uploaded Mar 16, 2026 Python 3manylinux: glibc 2.27+ x86-64

trillim-0.6.0-py3-none-manylinux_2_27_aarch64.manylinux2014_aarch64.whl (12.8 MB view details)

Uploaded Mar 16, 2026 Python 3manylinux: glibc 2.27+ ARM64

trillim-0.6.0-py3-none-macosx_11_0_x86_64.whl (2.2 MB view details)

Uploaded Mar 16, 2026 Python 3macOS 11.0+ x86-64

trillim-0.6.0-py3-none-macosx_11_0_arm64.whl (2.3 MB view details)

Uploaded Mar 16, 2026 Python 3macOS 11.0+ ARM64

File details

Details for the file trillim-0.6.0-py3-none-win_arm64.whl.

File metadata

Download URL: trillim-0.6.0-py3-none-win_arm64.whl
Upload date: Mar 16, 2026
Size: 2.4 MB
Tags: Python 3, Windows ARM64
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.6.0-py3-none-win_arm64.whl
Algorithm	Hash digest
SHA256	`875422246a94bb4401799eb0072432efb87147f5bb05533eec5400f8db330a0b`
MD5	`88ca70269e9f60eaa5353319b830fad3`
BLAKE2b-256	`75a7c66879789260d9506e9b15e727cebcf2a13597ba02880f7f4335f594a3a3`

See more details on using hashes here.

File details

Details for the file trillim-0.6.0-py3-none-win_amd64.whl.

File metadata

Download URL: trillim-0.6.0-py3-none-win_amd64.whl
Upload date: Mar 16, 2026
Size: 2.5 MB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.6.0-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`4b3e956ee6f1c935a87e76d4c62abc6b70ffcedb1d11252bba34a13c4e8169cc`
MD5	`34765d21c991de16c4a5f319a1fd08da`
BLAKE2b-256	`c888ae5c46c4b1807fc2578dcac62511f154537dd90c237ebf6c656c13aca1b8`

See more details on using hashes here.

File details

Details for the file trillim-0.6.0-py3-none-manylinux_2_27_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: trillim-0.6.0-py3-none-manylinux_2_27_x86_64.manylinux2014_x86_64.whl
Upload date: Mar 16, 2026
Size: 12.5 MB
Tags: Python 3, manylinux: glibc 2.27+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.6.0-py3-none-manylinux_2_27_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`2eff2fd04fd004699d793278dcdeb6171927bb48047fdf7a7c163bce5557a28f`
MD5	`2f9a34cffa4feac49b7aa5e0607338b0`
BLAKE2b-256	`7d842499986f0f090cb0a418a192b1e2e8f348fe411cb2618db0611a4ba3750a`

See more details on using hashes here.

File details

Details for the file trillim-0.6.0-py3-none-manylinux_2_27_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: trillim-0.6.0-py3-none-manylinux_2_27_aarch64.manylinux2014_aarch64.whl
Upload date: Mar 16, 2026
Size: 12.8 MB
Tags: Python 3, manylinux: glibc 2.27+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.6.0-py3-none-manylinux_2_27_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`ddca33a97d2eee1e54847cce906d047ab2717edd8b2c86e2738b985ddaf8162d`
MD5	`3de5c4d2b41e018551085a536c736a07`
BLAKE2b-256	`cb0a0d96cfcf6363ef90475c51eb70a63a58ce342903509a786337ef604ccdcb`

See more details on using hashes here.

File details

Details for the file trillim-0.6.0-py3-none-macosx_11_0_x86_64.whl.

File metadata

Download URL: trillim-0.6.0-py3-none-macosx_11_0_x86_64.whl
Upload date: Mar 16, 2026
Size: 2.2 MB
Tags: Python 3, macOS 11.0+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.6.0-py3-none-macosx_11_0_x86_64.whl
Algorithm	Hash digest
SHA256	`b6064a381aa7b627d72c606c0034bec0c3f083a35af512d61c8b8bb0dcfd0e24`
MD5	`c75064cba73d25e7d6ffef5ec6e597c1`
BLAKE2b-256	`396f8b9346b71eb37191703b733ac9684e38b52f2781e27298efe5f96340a91d`

See more details on using hashes here.

File details

Details for the file trillim-0.6.0-py3-none-macosx_11_0_arm64.whl.

File metadata

Download URL: trillim-0.6.0-py3-none-macosx_11_0_arm64.whl
Upload date: Mar 16, 2026
Size: 2.3 MB
Tags: Python 3, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.0

File hashes

Hashes for trillim-0.6.0-py3-none-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`2ba505307a9f08d5f0b5810ade9373f2050e4a860595b3a7470eeb83f1be9989`
MD5	`c8c4ac35f8f6fe4a89818f0ffb118300`
BLAKE2b-256	`0c47ad4b97797f5b30376b822b62c9f72d40779ff8d72822fa488bad99e0c704`

See more details on using hashes here.

trillim 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Trillim

Install

Common Workflows

Pull a Model

Chat in the Terminal

Search-Augmented Chat

Serve an OpenAI-Compatible API

Quantize a Model or Adapter

Use a LoRA Adapter

Runtime Quantization

Voice Support

Performance Highlights

Supported Architectures

Platform Support

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes