Skip to main content

TurboQuant-powered agentic AI framework for long-context LLMs on consumer hardware

Project description

TurboAgent

TurboQuant-powered agentic AI framework for long-context LLMs on consumer hardware.

TurboAgent is a pip-installable Python package that brings Google Research's TurboQuant KV-cache compression to open-source LLMs for local, consumer-hardware agentic AI. It delivers 6x+ memory reduction and up to 8x attention speedup with zero measurable accuracy loss.

Features

  • One-line agent creation with 6x+ KV compression -- 32k-1M+ effective context on a single RTX 4090
  • Hardware-aware auto-tuning -- detects CUDA/ROCm/Metal/CPU and selects optimal configuration
  • Agentic-first primitives -- persistent multi-turn memory, RAG with vector-search, multi-agent swarms
  • Multiple backends -- llama.cpp (consumer GPUs), vLLM (server throughput), PyTorch (research)
  • Zero-calibration, training-free -- just like the paper guarantees

Quick Start

pip install turboagent-ai[llama]
from turboagent import TurboAgent

agent = TurboAgent(
    "meta-llama/Llama-3.1-70B-Instruct",
    kv_mode="turbo3",
    context=131072,
)

response = agent.run("Analyze my 50k-token research doc and suggest experiments...")
print(response)  # KV usage <4 GB total

Installation

# Core + llama.cpp backend (recommended for consumer GPUs)
pip install turboagent-ai[llama]

# With vLLM for server-style throughput
pip install turboagent-ai[vllm]

# With HuggingFace Transformers for research
pip install turboagent-ai[torch]

# With native TurboQuant C++/CUDA kernels (recommended for best performance)
pip install turboagent-ai[native]

# Development
pip install turboagent-ai[dev]

CLI

# Scaffold a new agent project
turboagent init my_agent

# Detect hardware and show optimal configuration
turboagent info

# Run benchmarks
turboagent benchmark --model-size 70

Multi-Agent Swarms

from turboagent.agents.swarm import TurboSwarm, SwarmAgent

swarm = TurboSwarm(
    "meta-llama/Llama-3.1-70B-Instruct",
    agents=[
        SwarmAgent(name="researcher", role="deep research"),
        SwarmAgent(name="critic", role="critical review"),
        SwarmAgent(name="writer", role="clear writing"),
    ],
)

results = swarm.run("Analyze the latest advances in KV cache compression.")

RAG with TurboVectorStore

from turboagent.agents.rag import TurboVectorStore

store = TurboVectorStore(embedding_dim=768)
store.add_documents(texts=chunks, embeddings=embeddings)
results = store.query(query_embedding, top_k=5)

Architecture

turboagent/
├── quant/          # TurboQuantKVCache (PolarQuant + QJL)
├── backends/       # llama.cpp, vLLM, PyTorch engines
├── agents/         # TurboAgent, TurboVectorStore, TurboSwarm
├── hardware/       # Auto-detection and optimal config
├── cli.py          # Project scaffolding and benchmarks
└── utils.py        # Shared helpers

TurboQuant Compression Modes

Mode Bits per Value Compression Best For
turbo3 3.25 bpv 4.9x Maximum context on limited VRAM
turbo4 4.25 bpv 3.8x Higher quality, ample memory

Requirements

  • Python >= 3.10
  • PyTorch >= 2.5.0
  • One of: llama-cpp-python, vLLM, or HuggingFace Transformers

Development

git clone https://github.com/TurboAgentAI/turboagent.git
cd turboagent
pip install -e ".[dev]"
pytest tests/ -v -m "not integration"

Enterprise

The open-source core is free forever under the MIT license.

TurboAgent Enterprise adds commercial extensions for teams and organizations:

  • SSO / SAML authentication
  • Audit logging and compliance exports (SOC-2, GDPR)
  • Air-gapped on-premise licensing
  • SecureMultiAgentSwarm with governance policies and RBAC
  • Multi-node KV cache sharing
  • Priority kernels and dedicated support SLAs
# Enterprise features activate with a license key
# export TURBOAGENT_LICENSE_KEY="TA-ENT-your-key-here"

from turboagent.enterprise.swarm import SecureMultiAgentSwarm
from turboagent.enterprise.audit import AuditLogger

Learn more: turboagent.to/enterprise | Contact: enterprise@turboagent.to

License

MIT — the open-source core is free for commercial and personal use. Commercial extensions are available under a separate license. See Enterprise.

Acknowledgments

Built on community TurboQuant implementations:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turboagent_ai-0.1.0.tar.gz (56.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turboagent_ai-0.1.0-py3-none-any.whl (54.3 kB view details)

Uploaded Python 3

File details

Details for the file turboagent_ai-0.1.0.tar.gz.

File metadata

  • Download URL: turboagent_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 56.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for turboagent_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 734f45dbbe88d2049677a6f68f6dff023f4f0ae91f1ceb7d84c773827f1d3d84
MD5 e7f9a7ecc557a300f457f57337fbb856
BLAKE2b-256 c65b727de7abcf3c6900ad305aa2a34fd0c0aa19fba99e2cdedeebc72e68cc29

See more details on using hashes here.

File details

Details for the file turboagent_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: turboagent_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 54.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for turboagent_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0a57f86d86cfb44cce2ebdaaef0123589afff0b63dd385292ef90d637ed8e6d
MD5 59e8513e5e6ccd31284ea5864f9b719d
BLAKE2b-256 842cca3be3259bc4d81da579c5ba7cf11fc9632429dc8f69e031c9441286ee50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page