Fast local LLM inference with TTT (Test-Time Training) and LoRA — the model that learns while it runs

These details have not been verified by PyPI

Project links

Homepage

Project description

🧠 Bit-TTT-Engine

Fast local LLM inference that learns while it runs.

🏎️ 47+ tok/s on RTX 4060 Ti (7B Q4_K_M)
🧠 TTT (Test-Time Training) — adapts during inference (world's first!)
🎨 LoRA — fine-tune with one flag
📦 5 models — Llama-2/3, Gemma-2, Qwen2.5, Mistral
🔌 OpenAI-compatible API — drop-in replacement

🚀 Quick Start

pip install bit-ttt-engine

import cortex_rust

# Load any GGUF model (auto-downloads from HuggingFace!)
model = cortex_rust.load("user/model-GGUF")

# Chat
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response)

# Stream
for token in model.chat_stream([
    {"role": "user", "content": "Tell me a story"}
]):
    print(token, end="", flush=True)

🖥️ CLI

# Interactive chat
bit-ttt chat model.gguf

# Generate text
bit-ttt generate model.gguf -p "Once upon a time"

# OpenAI-compatible API server
bit-ttt serve model.gguf --port 8000

# With LoRA + Q8 KV cache
bit-ttt chat model.gguf --lora adapter.bin --q8-cache

🧠 TTT — Test-Time Training

The model learns while it generates. No other local LLM does this.

model = cortex_rust.load("model.gguf")
model.enable_ttt(True)

# Each conversation makes the model smarter
response = model.chat([{"role": "user", "content": "My name is Alice"}])
# Next time, it remembers context better!

⚡ Performance

Model	Speed	VRAM
Llama-2 7B (Q4_K_M)	47.8 tok/s	~5 GB
Llama-3 8B (Q4_K_M)	36.8 tok/s	~6 GB
Mistral 7B (Q4_K_M)	40.8 tok/s	~5 GB
Qwen2.5 1.5B (Q4_K_M)	70.4 tok/s	~2 GB

With --q8-cache: 82% VRAM reduction for KV cache.

🔌 OpenAI-Compatible API

bit-ttt serve model.gguf --port 8000

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "Hi!"}],
    stream=True,
)

📖 Links

💖 License

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.8.0

Feb 6, 2026

This version

0.7.0

Feb 4, 2026

0.6.2

Jan 31, 2026

0.6.1

Jan 31, 2026

0.6.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bit_ttt_engine-0.7.0.tar.gz (414.9 kB view details)

Uploaded Feb 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bit_ttt_engine-0.7.0-cp310-cp310-win_amd64.whl (5.4 MB view details)

Uploaded Feb 4, 2026 CPython 3.10Windows x86-64

File details

Details for the file bit_ttt_engine-0.7.0.tar.gz.

File metadata

Download URL: bit_ttt_engine-0.7.0.tar.gz
Upload date: Feb 4, 2026
Size: 414.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.3

File hashes

Hashes for bit_ttt_engine-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`3a9d49dab0b32130ad39fd7e0b9ad1ae8567e356a277143521e14f105a32c2f0`
MD5	`50953c7d0f21198adf8f3a2d9552673f`
BLAKE2b-256	`d1e3002078d4a4229205893cbd3accb858a759ae73db567e1aa7eed8dd7291b7`

See more details on using hashes here.

File details

Details for the file bit_ttt_engine-0.7.0-cp310-cp310-win_amd64.whl.

File metadata

Download URL: bit_ttt_engine-0.7.0-cp310-cp310-win_amd64.whl
Upload date: Feb 4, 2026
Size: 5.4 MB
Tags: CPython 3.10, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.3

File hashes

Hashes for bit_ttt_engine-0.7.0-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`0a1724d04e58774427fb07df7573e1ecaff119c257349543128f1112d10c2795`
MD5	`02c2d5ff1f6bfaa9a253260289668b38`
BLAKE2b-256	`66b29733d670f2660713cecee8a6e3cad88b9041a67b417e5da811c24aafead0`

See more details on using hashes here.

bit-ttt-engine 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧠 Bit-TTT-Engine

🚀 Quick Start

🖥️ CLI

🧠 TTT — Test-Time Training

⚡ Performance

🔌 OpenAI-Compatible API

📖 Links

💖 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes