TurboQuant-powered agentic AI framework for long-context LLMs on consumer hardware
Project description
TurboAgent
TurboQuant-powered agentic AI framework for long-context LLMs on consumer hardware.
TurboAgent is a pip-installable Python package that brings Google Research's TurboQuant KV-cache compression to open-source LLMs for local, consumer-hardware agentic AI. It delivers 6x+ memory reduction and up to 8x attention speedup with zero measurable accuracy loss.
Features
- One-line agent creation with 6x+ KV compression -- 32k-1M+ effective context on a single RTX 4090
- Hardware-aware auto-tuning -- detects CUDA/ROCm/Metal/CPU and selects optimal configuration
- Agentic-first primitives -- persistent multi-turn memory, RAG with vector-search, multi-agent swarms
- Multiple backends -- llama.cpp (consumer GPUs), vLLM (server throughput), PyTorch (research)
- Zero-calibration, training-free -- just like the paper guarantees
Quick Start
pip install turboagent-ai[llama]
from turboagent import TurboAgent
agent = TurboAgent(
"meta-llama/Llama-3.1-70B-Instruct",
kv_mode="turbo3",
context=131072,
)
response = agent.run("Analyze my 50k-token research doc and suggest experiments...")
print(response) # KV usage <4 GB total
Installation
# Core + llama.cpp backend (recommended for consumer GPUs)
pip install turboagent-ai[llama]
# With vLLM for server-style throughput
pip install turboagent-ai[vllm]
# With HuggingFace Transformers for research
pip install turboagent-ai[torch]
# With native TurboQuant C++/CUDA kernels (recommended for best performance)
pip install turboagent-ai[native]
# Development
pip install turboagent-ai[dev]
CLI
# Scaffold a new agent project
turboagent init my_agent
# Detect hardware and show optimal configuration
turboagent info
# Run benchmarks
turboagent benchmark --model-size 70
Multi-Agent Swarms
from turboagent.agents.swarm import TurboSwarm, SwarmAgent
swarm = TurboSwarm(
"meta-llama/Llama-3.1-70B-Instruct",
agents=[
SwarmAgent(name="researcher", role="deep research"),
SwarmAgent(name="critic", role="critical review"),
SwarmAgent(name="writer", role="clear writing"),
],
)
results = swarm.run("Analyze the latest advances in KV cache compression.")
RAG with TurboVectorStore
from turboagent.agents.rag import TurboVectorStore
store = TurboVectorStore(embedding_dim=768)
store.add_documents(texts=chunks, embeddings=embeddings)
results = store.query(query_embedding, top_k=5)
Architecture
turboagent/
├── quant/ # TurboQuantKVCache (PolarQuant + QJL)
├── backends/ # llama.cpp, vLLM, PyTorch engines
├── agents/ # TurboAgent, TurboVectorStore, TurboSwarm
├── hardware/ # Auto-detection and optimal config
├── cli.py # Project scaffolding and benchmarks
└── utils.py # Shared helpers
TurboQuant Compression Modes
| Mode | Bits per Value | Compression | Best For |
|---|---|---|---|
| turbo3 | 3.25 bpv | 4.9x | Maximum context on limited VRAM |
| turbo4 | 4.25 bpv | 3.8x | Higher quality, ample memory |
Requirements
- Python >= 3.10
- PyTorch >= 2.5.0
- One of: llama-cpp-python, vLLM, or HuggingFace Transformers
Development
git clone https://github.com/TurboAgentAI/turboagent.git
cd turboagent
pip install -e ".[dev]"
pytest tests/ -v -m "not integration"
Enterprise
The open-source core is free forever under the MIT license.
TurboAgent Enterprise adds commercial extensions for teams and organizations:
- SSO / SAML authentication
- Audit logging and compliance exports (SOC-2, GDPR)
- Air-gapped on-premise licensing
- SecureMultiAgentSwarm with governance policies and RBAC
- Multi-node KV cache sharing
- Priority kernels and dedicated support SLAs
# Enterprise features activate with a license key
# export TURBOAGENT_LICENSE_KEY="TA-ENT-your-key-here"
from turboagent.enterprise.swarm import SecureMultiAgentSwarm
from turboagent.enterprise.audit import AuditLogger
Learn more: turboagent.to/enterprise | Contact: enterprise@turboagent.to
License
MIT — the open-source core is free for commercial and personal use. Commercial extensions are available under a separate license. See Enterprise.
Acknowledgments
Built on community TurboQuant implementations:
- tonbistudio/turboquant-pytorch (PyTorch reference)
- TheTom/llama-cpp-turboquant (llama.cpp fork)
- 0xSero/turboquant (vLLM Triton kernels)
- turboquant-kv (C++/CUDA bindings)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turboagent_ai-1.1.0.tar.gz.
File metadata
- Download URL: turboagent_ai-1.1.0.tar.gz
- Upload date:
- Size: 84.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2e15eff01c1b576f233c15ff10e10b29f15bbf7cda64fe7456a6f4089c8be38
|
|
| MD5 |
483f5b5f7b177042895b7ea80c545bfb
|
|
| BLAKE2b-256 |
f4284e30fdfd7a46da20d735f487c0f46222fe1ecc982f27dd0c71163748ff0e
|
File details
Details for the file turboagent_ai-1.1.0-py3-none-any.whl.
File metadata
- Download URL: turboagent_ai-1.1.0-py3-none-any.whl
- Upload date:
- Size: 69.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c771cdac07f35f4d56201462077595ac343996042346b6de348aaf00ef1d7c6
|
|
| MD5 |
db88c8eae8ebfc2158e0bc058f861be1
|
|
| BLAKE2b-256 |
e668031d2899a9b5601d50ebdac5ebb4686b3dc596800ef767fef6cc18a03629
|