TurboQuant-powered agentic AI framework for long-context LLMs on consumer hardware

These details have not been verified by PyPI

Project links

Project description

TurboAgent

TurboQuant-powered agentic AI framework for long-context LLMs on consumer hardware.

TurboAgent is a pip-installable Python package that brings Google Research's TurboQuant KV-cache compression to open-source LLMs for local, consumer-hardware agentic AI. It delivers 6x+ memory reduction and up to 8x attention speedup with zero measurable accuracy loss.

Features

One-line agent creation with 6x+ KV compression -- 32k-1M+ effective context on a single RTX 4090
Hardware-aware auto-tuning -- detects CUDA/ROCm/Metal/CPU and selects optimal configuration
Agentic-first primitives -- persistent multi-turn memory, RAG with vector-search, multi-agent swarms
Multiple backends -- llama.cpp (consumer GPUs), vLLM (server throughput), PyTorch (research)
Zero-calibration, training-free -- just like the paper guarantees

Quick Start

pip install turboagent-ai[llama]

from turboagent import TurboAgent

agent = TurboAgent(
    "meta-llama/Llama-3.1-70B-Instruct",
    kv_mode="turbo3",
    context=131072,
)

response = agent.run("Analyze my 50k-token research doc and suggest experiments...")
print(response)  # KV usage <4 GB total

Installation

# Core + llama.cpp backend (recommended for consumer GPUs)
pip install turboagent-ai[llama]

# With vLLM for server-style throughput
pip install turboagent-ai[vllm]

# With HuggingFace Transformers for research
pip install turboagent-ai[torch]

# With native TurboQuant C++/CUDA kernels (recommended for best performance)
pip install turboagent-ai[native]

# Development
pip install turboagent-ai[dev]

CLI

# Scaffold a new agent project
turboagent init my_agent

# Detect hardware and show optimal configuration
turboagent info

# Run benchmarks
turboagent benchmark --model-size 70

Multi-Agent Swarms

from turboagent.agents.swarm import TurboSwarm, SwarmAgent

swarm = TurboSwarm(
    "meta-llama/Llama-3.1-70B-Instruct",
    agents=[
        SwarmAgent(name="researcher", role="deep research"),
        SwarmAgent(name="critic", role="critical review"),
        SwarmAgent(name="writer", role="clear writing"),
    ],
)

results = swarm.run("Analyze the latest advances in KV cache compression.")

RAG with TurboVectorStore

from turboagent.agents.rag import TurboVectorStore

store = TurboVectorStore(embedding_dim=768)
store.add_documents(texts=chunks, embeddings=embeddings)
results = store.query(query_embedding, top_k=5)

Architecture

turboagent/
├── quant/          # TurboQuantKVCache (PolarQuant + QJL)
├── backends/       # llama.cpp, vLLM, PyTorch engines
├── agents/         # TurboAgent, TurboVectorStore, TurboSwarm
├── hardware/       # Auto-detection and optimal config
├── cli.py          # Project scaffolding and benchmarks
└── utils.py        # Shared helpers

TurboQuant Compression Modes

Mode	Bits per Value	Compression	Best For
turbo3	3.25 bpv	4.9x	Maximum context on limited VRAM
turbo4	4.25 bpv	3.8x	Higher quality, ample memory

Requirements

Python >= 3.10
PyTorch >= 2.5.0
One of: llama-cpp-python, vLLM, or HuggingFace Transformers

Development

git clone https://github.com/TurboAgentAI/turboagent.git
cd turboagent
pip install -e ".[dev]"
pytest tests/ -v -m "not integration"

Enterprise

The open-source core is free forever under the MIT license.

TurboAgent Enterprise adds commercial extensions for teams and organizations:

SSO / SAML authentication
Audit logging and compliance exports (SOC-2, GDPR)
Air-gapped on-premise licensing
SecureMultiAgentSwarm with governance policies and RBAC
Multi-node KV cache sharing
Priority kernels and dedicated support SLAs

# Enterprise features activate with a license key
# export TURBOAGENT_LICENSE_KEY="TA-ENT-your-key-here"

from turboagent.enterprise.swarm import SecureMultiAgentSwarm
from turboagent.enterprise.audit import AuditLogger

Learn more: turboagent.to/enterprise | Contact: enterprise@turboagent.to

License

MIT — the open-source core is free for commercial and personal use. Commercial extensions are available under a separate license. See Enterprise.

Acknowledgments

Built on community TurboQuant implementations:

tonbistudio/turboquant-pytorch (PyTorch reference)
TheTom/llama-cpp-turboquant (llama.cpp fork)
0xSero/turboquant (vLLM Triton kernels)
turboquant-kv (C++/CUDA bindings)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Apr 20, 2026

0.1.0

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turboagent_ai-1.1.0.tar.gz (84.1 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

turboagent_ai-1.1.0-py3-none-any.whl (69.3 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file turboagent_ai-1.1.0.tar.gz.

File metadata

Download URL: turboagent_ai-1.1.0.tar.gz
Upload date: Apr 20, 2026
Size: 84.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for turboagent_ai-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f2e15eff01c1b576f233c15ff10e10b29f15bbf7cda64fe7456a6f4089c8be38`
MD5	`483f5b5f7b177042895b7ea80c545bfb`
BLAKE2b-256	`f4284e30fdfd7a46da20d735f487c0f46222fe1ecc982f27dd0c71163748ff0e`

See more details on using hashes here.

File details

Details for the file turboagent_ai-1.1.0-py3-none-any.whl.

File metadata

Download URL: turboagent_ai-1.1.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 69.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for turboagent_ai-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c771cdac07f35f4d56201462077595ac343996042346b6de348aaf00ef1d7c6`
MD5	`db88c8eae8ebfc2158e0bc058f861be1`
BLAKE2b-256	`e668031d2899a9b5601d50ebdac5ebb4686b3dc596800ef767fef6cc18a03629`

See more details on using hashes here.

turboagent-ai 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TurboAgent

Features

Quick Start

Installation

CLI

Multi-Agent Swarms

RAG with TurboVectorStore

Architecture

TurboQuant Compression Modes

Requirements

Development

Enterprise

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes