Fast local LLM inference with TTT (Test-Time Training) and LoRA — the model that learns while it runs
Project description
🧠 Bit-TTT-Engine
Fast local LLM inference that learns while it runs.
- 🏎️ 47+ tok/s on RTX 4060 Ti (7B Q4_K_M)
- 🧠 TTT (Test-Time Training) — adapts during inference (world's first!)
- 🎨 LoRA — fine-tune with one flag
- 📦 5 models — Llama-2/3, Gemma-2, Qwen2.5, Mistral
- 🔌 OpenAI-compatible API — drop-in replacement
🚀 Quick Start
pip install bit-ttt-engine
import cortex_rust
# Load any GGUF model (auto-downloads from HuggingFace!)
model = cortex_rust.load("user/model-GGUF")
# Chat
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response)
# Stream
for token in model.chat_stream([
{"role": "user", "content": "Tell me a story"}
]):
print(token, end="", flush=True)
🖥️ CLI
# Interactive chat
bit-ttt chat model.gguf
# Generate text
bit-ttt generate model.gguf -p "Once upon a time"
# OpenAI-compatible API server
bit-ttt serve model.gguf --port 8000
# With LoRA + Q8 KV cache
bit-ttt chat model.gguf --lora adapter.bin --q8-cache
🧠 TTT — Test-Time Training
The model learns while it generates. No other local LLM does this.
model = cortex_rust.load("model.gguf")
model.enable_ttt(True)
# Each conversation makes the model smarter
response = model.chat([{"role": "user", "content": "My name is Alice"}])
# Next time, it remembers context better!
⚡ Performance
| Model | Speed | VRAM |
|---|---|---|
| Llama-2 7B (Q4_K_M) | 47.8 tok/s | ~5 GB |
| Llama-3 8B (Q4_K_M) | 36.8 tok/s | ~6 GB |
| Mistral 7B (Q4_K_M) | 40.8 tok/s | ~5 GB |
| Qwen2.5 1.5B (Q4_K_M) | 70.4 tok/s | ~2 GB |
With --q8-cache: 82% VRAM reduction for KV cache.
🔌 OpenAI-Compatible API
bit-ttt serve model.gguf --port 8000
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
model="default",
messages=[{"role": "user", "content": "Hi!"}],
stream=True,
)
📖 Links
💖 License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bit_ttt_engine-0.7.0.tar.gz
(414.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bit_ttt_engine-0.7.0.tar.gz.
File metadata
- Download URL: bit_ttt_engine-0.7.0.tar.gz
- Upload date:
- Size: 414.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a9d49dab0b32130ad39fd7e0b9ad1ae8567e356a277143521e14f105a32c2f0
|
|
| MD5 |
50953c7d0f21198adf8f3a2d9552673f
|
|
| BLAKE2b-256 |
d1e3002078d4a4229205893cbd3accb858a759ae73db567e1aa7eed8dd7291b7
|
File details
Details for the file bit_ttt_engine-0.7.0-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: bit_ttt_engine-0.7.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 5.4 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a1724d04e58774427fb07df7573e1ecaff119c257349543128f1112d10c2795
|
|
| MD5 |
02c2d5ff1f6bfaa9a253260289668b38
|
|
| BLAKE2b-256 |
66b29733d670f2660713cecee8a6e3cad88b9041a67b417e5da811c24aafead0
|