Skip to main content

TurboQuant-powered agentic AI framework for long-context LLMs on consumer hardware

Project description

TurboAgent

TurboQuant-powered agentic AI framework for long-context LLMs on consumer hardware.

TurboAgent is a pip-installable Python package that brings Google Research's TurboQuant KV-cache compression to open-source LLMs for local, consumer-hardware agentic AI. It delivers 6x+ memory reduction and up to 8x attention speedup with zero measurable accuracy loss.

Features

  • One-line agent creation with 6x+ KV compression -- 32k-1M+ effective context on a single RTX 4090
  • Hardware-aware auto-tuning -- detects CUDA/ROCm/Metal/CPU and selects optimal configuration
  • Agentic-first primitives -- persistent multi-turn memory, RAG with vector-search, multi-agent swarms
  • Multiple backends -- llama.cpp (consumer GPUs), vLLM (server throughput), PyTorch (research)
  • Zero-calibration, training-free -- just like the paper guarantees

Quick Start

pip install turboagent-ai[llama]
from turboagent import TurboAgent

agent = TurboAgent(
    "meta-llama/Llama-3.1-70B-Instruct",
    kv_mode="turbo3",
    context=131072,
)

response = agent.run("Analyze my 50k-token research doc and suggest experiments...")
print(response)  # KV usage <4 GB total

Installation

# Core + llama.cpp backend (recommended for consumer GPUs)
pip install turboagent-ai[llama]

# With vLLM for server-style throughput
pip install turboagent-ai[vllm]

# With HuggingFace Transformers for research
pip install turboagent-ai[torch]

# With native TurboQuant C++/CUDA kernels (recommended for best performance)
pip install turboagent-ai[native]

# Development
pip install turboagent-ai[dev]

CLI

# Scaffold a new agent project
turboagent init my_agent

# Detect hardware and show optimal configuration
turboagent info

# Run benchmarks
turboagent benchmark --model-size 70

Multi-Agent Swarms

from turboagent.agents.swarm import TurboSwarm, SwarmAgent

swarm = TurboSwarm(
    "meta-llama/Llama-3.1-70B-Instruct",
    agents=[
        SwarmAgent(name="researcher", role="deep research"),
        SwarmAgent(name="critic", role="critical review"),
        SwarmAgent(name="writer", role="clear writing"),
    ],
)

results = swarm.run("Analyze the latest advances in KV cache compression.")

RAG with TurboVectorStore

from turboagent.agents.rag import TurboVectorStore

store = TurboVectorStore(embedding_dim=768)
store.add_documents(texts=chunks, embeddings=embeddings)
results = store.query(query_embedding, top_k=5)

Architecture

turboagent/
├── quant/          # TurboQuantKVCache (PolarQuant + QJL)
├── backends/       # llama.cpp, vLLM, PyTorch engines
├── agents/         # TurboAgent, TurboVectorStore, TurboSwarm
├── hardware/       # Auto-detection and optimal config
├── cli.py          # Project scaffolding and benchmarks
└── utils.py        # Shared helpers

TurboQuant Compression Modes

Mode Bits per Value Compression Best For
turbo3 3.25 bpv 4.9x Maximum context on limited VRAM
turbo4 4.25 bpv 3.8x Higher quality, ample memory

Requirements

  • Python >= 3.10
  • PyTorch >= 2.5.0
  • One of: llama-cpp-python, vLLM, or HuggingFace Transformers

Development

git clone https://github.com/TurboAgentAI/turboagent.git
cd turboagent
pip install -e ".[dev]"
pytest tests/ -v -m "not integration"

Enterprise

The open-source core is free forever under the MIT license.

TurboAgent Enterprise adds commercial extensions for teams and organizations:

  • SSO / SAML authentication
  • Audit logging and compliance exports (SOC-2, GDPR)
  • Air-gapped on-premise licensing
  • SecureMultiAgentSwarm with governance policies and RBAC
  • Multi-node KV cache sharing
  • Priority kernels and dedicated support SLAs
# Enterprise features activate with a license key
# export TURBOAGENT_LICENSE_KEY="TA-ENT-your-key-here"

from turboagent.enterprise.swarm import SecureMultiAgentSwarm
from turboagent.enterprise.audit import AuditLogger

Learn more: turboagent.to/enterprise | Contact: enterprise@turboagent.to

License

MIT — the open-source core is free for commercial and personal use. Commercial extensions are available under a separate license. See Enterprise.

Acknowledgments

Built on community TurboQuant implementations:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turboagent_ai-1.1.0.tar.gz (84.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turboagent_ai-1.1.0-py3-none-any.whl (69.3 kB view details)

Uploaded Python 3

File details

Details for the file turboagent_ai-1.1.0.tar.gz.

File metadata

  • Download URL: turboagent_ai-1.1.0.tar.gz
  • Upload date:
  • Size: 84.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for turboagent_ai-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f2e15eff01c1b576f233c15ff10e10b29f15bbf7cda64fe7456a6f4089c8be38
MD5 483f5b5f7b177042895b7ea80c545bfb
BLAKE2b-256 f4284e30fdfd7a46da20d735f487c0f46222fe1ecc982f27dd0c71163748ff0e

See more details on using hashes here.

File details

Details for the file turboagent_ai-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: turboagent_ai-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 69.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for turboagent_ai-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c771cdac07f35f4d56201462077595ac343996042346b6de348aaf00ef1d7c6
MD5 db88c8eae8ebfc2158e0bc058f861be1
BLAKE2b-256 e668031d2899a9b5601d50ebdac5ebb4686b3dc596800ef767fef6cc18a03629

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page