Skip to main content

Independent TurboQuant-style compression infrastructure for KV cache and RAG.

Project description

turboagents

turboagents logo

Turbocharge AI Agents with TurboQuant

turboagents is a single Python package for TurboQuant-style KV-cache and vector compression. It is being built as independent compression infrastructure that can be used standalone and integrated into SuperOptix.

Repository: https://github.com/SuperagenticAI/turboagents

What It Is

turboagents is not an agent framework. It is the compression layer you put under existing AI agents, inference engines, and RAG stacks so they can:

  • hold longer contexts
  • use less KV-cache memory
  • store more embeddings at lower cost
  • benchmark quality and memory tradeoffs explicitly

Think of it as:

  • TurboQuant for real systems
  • TurboRAG for vector retrieval stacks
  • adapters and tooling around existing engines instead of a replacement for them

Who It Is For

turboagents is for teams and developers who already have:

  • AI agents that hit memory limits on long prompts
  • RAG systems with large embedding stores
  • inference stacks built on MLX, llama.cpp, vLLM, FAISS, LanceDB, SurrealDB, or pgvector
  • agent frameworks that need compression infrastructure, not another framework

How To Use It

Most users should think about turboagents in three ways.

1. Add It Under An Existing Agent Runtime

If you already have an agent system, keep the agent layer and use turboagents to improve the inference or memory layer under it.

Examples:

  • use turboagents.engines.mlx for MLX-based local agents
  • use turboagents.engines.llamacpp to build llama.cpp runtime commands
  • use turboagents.engines.vllm as an experimental runtime wrapper

2. Add It Under An Existing RAG Stack

If you already have retrieval, keep your current application logic and add TurboRAG where vectors are stored or searched.

Examples:

  • use TurboFAISS when you want a local FAISS-backed retrieval path
  • use TurboLanceDB or TurboSurrealDB when you want a sidecar/rerank integration
  • use TurboPgvector when your application already depends on PostgreSQL

3. Use It As A Benchmark And Compression Tool

If you are still evaluating whether TurboQuant-style compression makes sense for your stack, use the CLI first:

  • turboagents doctor
  • turboagents bench kv
  • turboagents bench rag
  • turboagents compress

That gives you a way to validate fit before deeper integration work.

Usage Patterns

Existing Agent + MLX

Use TurboAgents to build or validate the MLX runtime path, then keep your existing agent code on top.

turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run

Existing RAG + FAISS

Use TurboRAG as the retrieval layer while keeping your current ingestion and agent orchestration logic.

from turboagents.rag import TurboFAISS

index = TurboFAISS(dim=128, bits=3.5, seed=0)
index.add(vectors)
results = index.search(query, k=5, rerank_top=16)

Existing Vector Database

If you already use a database-backed vector layer, TurboAgents should sit beside that store first, then move deeper only if it proves useful.

Examples:

  • TurboLanceDB for LanceDB candidate search + TurboAgents rerank
  • TurboSurrealDB for SurrealDB candidate search + TurboAgents rerank
  • TurboPgvector for PostgreSQL-backed storage and retrieval

Current Status

  • structured quantization payloads with binary serialization
  • Fast Walsh-Hadamard rotation with cached sign patterns
  • PolarQuant-style spherical angle/radius stage
  • seeded QJL-style residual sketch
  • synthetic benchmark CLI with KV, RAG, and paper-style reports
  • real adapter surfaces for:
    • FAISS
    • LanceDB
    • SurrealDB
    • pgvector client adapter
    • MLX runtime/server wrapper
    • llama.cpp runtime wrapper
    • experimental vLLM runtime wrapper
  • proxy/server baseline
  • lightweight examples and docs

Still not finished:

  • full paper-faithful production math
  • native engine kernels for llama.cpp / MLX / vLLM
  • live Postgres validation for pgvector on this machine
  • large benchmark datasets and long-context benchmark matrix

Install

pip install -e .

Optional extras:

pip install -e ".[bench]"
pip install -e ".[docs]"
pip install -e ".[mlx]"
pip install -e ".[vllm]"
pip install -e ".[rag]"
pip install -e ".[all]"

CLI

turboagents doctor
turboagents bench kv --format json
turboagents bench rag --format markdown
turboagents bench paper
turboagents compress --input vectors.npy --output vectors.npz --head-dim 128
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run
turboagents serve --backend vllm --model meta-llama/Llama-3.1-8B-Instruct --dry-run

Examples

python3 examples/quickstart.py
python3 examples/bench_profiles.py
python3 examples/faiss_turborag.py
python3 examples/mlx_server_dry_run.py

Current Local Validation

  • cached MLX 3B smoke test succeeded on mlx-community/Llama-3.2-3B-Instruct-4bit
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest -q passes

Attribution

See ATTRIBUTION.md. This repository is not affiliated with Google Research.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turboagents-0.1.0a0.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turboagents-0.1.0a0-py3-none-any.whl (41.8 kB view details)

Uploaded Python 3

File details

Details for the file turboagents-0.1.0a0.tar.gz.

File metadata

  • Download URL: turboagents-0.1.0a0.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for turboagents-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 a1303028a888b5711c6e4d1a38367d67f2e8cf395ee3e107476b1639e60f8013
MD5 9115a15029bc7fb5398ff94ace093f28
BLAKE2b-256 cf1e7aa4f165398788a1cbbeebbb01e17d6b7c4143ecebd77a539d9c13a902a9

See more details on using hashes here.

File details

Details for the file turboagents-0.1.0a0-py3-none-any.whl.

File metadata

  • Download URL: turboagents-0.1.0a0-py3-none-any.whl
  • Upload date:
  • Size: 41.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for turboagents-0.1.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 79a388c99df32c5dd3c1a30c4be9309d10a87012d37501e9d3a4f357ea6ae9e6
MD5 7e8b8f9eef9577ca5170b920da78eb43
BLAKE2b-256 43ea1211babc0d6626c8ae6ee528eb51df12bbb7a9670640763cd439ac47d31c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page