Independent TurboQuant-style compression infrastructure for KV cache and RAG.

These details have not been verified by PyPI

Project links

Project description

turboagents

turboagents logo

Turbocharge AI Agents with TurboQuant

turboagents is a single Python package for TurboQuant-style KV-cache and vector compression. It is being built as independent compression infrastructure that can be used standalone and integrated into SuperOptix.

Repository: https://github.com/SuperagenticAI/turboagents

Why It Exists

Most AI stacks do not need another agent framework. They need the memory and retrieval layer underneath their existing agents to stop getting in the way.

turboagents is aimed at that layer:

compress KV-cache payloads so local and server-side inference can hold more context
compress vector payloads so retrieval systems can store and rerank more cheaply
benchmark the quality, latency, and recall tradeoffs explicitly instead of hiding them
integrate with runtimes and vector backends teams already use

At A Glance

Area	Current State
Quant core	Fast Walsh-Hadamard rotation, PolarQuant-style angle/radius stage, seeded QJL-style residual sketch, binary payloads
Engines	MLX wrapper, llama.cpp wrapper, experimental vLLM wrapper/plugin scaffold
Retrieval	Chroma, FAISS, LanceDB, SurrealDB, and pgvector client adapters
Benchmarks	Synthetic CLI, benchmark matrix, MLX sweep, adapter matrix, minimal Needle harness
Packaging	`uv`-first local workflow, docs, CI, release workflow, PyPI package

Quick Proof Points

The repository now has benchmark evidence, not just API surface:

Chroma reached recall@10 = 1.0 across the tested bit-width sweep in the local adapter benchmark
FAISS reached recall@10 = 1.0 across the tested bit-width sweep on medium-rag
pgvector reached recall@10 = 0.896875 at 4.0 bits on medium-rag
the MLX 3B sweep showed 3.5 bits as the best quality/performance operating point in that run
the minimal Needle long-context run showed early-position retrieval, but not robust mid/late-position retrieval

That matters because the repo can now make narrower, defensible claims:

it already works as compression infrastructure and benchmark tooling
it does not yet justify broad long-context quality claims

Fast Start

Install the package:

uv add turboagents

Install with useful extras:

uv add "turboagents[mlx]"
uv add "turboagents[rag]"
uv add "turboagents[all]"

Try the CLI first:

turboagents doctor
turboagents bench kv --format json
turboagents bench rag --format markdown
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run

Reference Integration

TurboAgents stays framework-agnostic, but the first full reference integration is now in SuperOptiX.

That matters because the current validated story is not only package-level unit tests. It also includes real SuperOptiX retrieval paths using TurboAgents under framework runtimes.

Current reference-integration status:

turboagents-chroma is wired into SuperOptiX and covered by focused runtime tests
turboagents-lancedb is validated through the real rag_lancedb_demo flow
turboagents-surrealdb is validated through the real SuperOptiX OpenAI Agents and Pydantic AI demo flows
the DSPy SurrealDB path is still blocked by a local LiteLLM/Ollama compatibility issue, not by the TurboAgents retrieval layer itself

If you want the end-to-end integration story, start here after installing TurboAgents:

SuperOptiX integration guide: https://superagenticai.github.io/superoptix/guides/turboagents-integration/
SuperOptiX LanceDB demo: https://superagenticai.github.io/superoptix/examples/agents/rag-lancedb-demo/
SuperOptiX SurrealDB frameworks guide: https://superagenticai.github.io/superoptix/examples/agents/surrealdb-frameworks-demo/

What It Is

turboagents is not an agent framework. It is the compression layer you put under existing AI agents, inference engines, and RAG stacks so they can:

hold longer contexts
use less KV-cache memory
store more embeddings at lower cost
benchmark quality and memory tradeoffs explicitly

Think of it as:

TurboQuant for real systems
TurboRAG for vector retrieval stacks
adapters and tooling around existing engines instead of a replacement for them

Who It Is For

turboagents is for teams and developers who already have:

AI agents that hit memory limits on long prompts
RAG systems with large embedding stores
inference stacks built on MLX, llama.cpp, vLLM, Chroma, FAISS, LanceDB, SurrealDB, or pgvector
agent frameworks that need compression infrastructure, not another framework

How To Use It

Most users should think about turboagents in three ways.

1. Add It Under An Existing Agent Runtime

If you already have an agent system, keep the agent layer and use turboagents to improve the inference or memory layer under it.

Examples:

use turboagents.engines.mlx for MLX-based local agents
use turboagents.engines.llamacpp to build llama.cpp runtime commands
use turboagents.engines.vllm as an experimental runtime wrapper

2. Add It Under An Existing RAG Stack

If you already have retrieval, keep your current application logic and add TurboRAG where vectors are stored or searched.

Examples:

use TurboFAISS when you want a local FAISS-backed retrieval path
use TurboChroma when you want Chroma candidate search plus TurboAgents rerank
use TurboLanceDB or TurboSurrealDB when you want a sidecar/rerank integration
use TurboPgvector when your application already depends on PostgreSQL

3. Use It As A Benchmark And Compression Tool

If you are still evaluating whether TurboQuant-style compression makes sense for your stack, use the CLI first:

turboagents doctor
turboagents bench kv
turboagents bench rag
turboagents compress

That gives you a way to validate fit before deeper integration work.

Usage Patterns

Existing Agent + MLX

Use TurboAgents to build or validate the MLX runtime path, then keep your existing agent code on top.

turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run

Existing RAG + FAISS

Use TurboRAG as the retrieval layer while keeping your current ingestion and agent orchestration logic.

from turboagents.rag import TurboFAISS

index = TurboFAISS(dim=128, bits=3.5, seed=0)
index.add(vectors)
results = index.search(query, k=5, rerank_top=16)

Existing Vector Database

If you already use a database-backed vector layer, TurboAgents should sit beside that store first, then move deeper only if it proves useful.

Examples:

TurboChroma for Chroma candidate search + TurboAgents rerank
TurboLanceDB for LanceDB candidate search + TurboAgents rerank
TurboSurrealDB for SurrealDB candidate search + TurboAgents rerank
TurboPgvector for PostgreSQL-backed storage and retrieval

Chroma and Context-1

TurboAgents now includes a Chroma adapter aligned to chromadb 1.5.5.

The right integration model is:

Context-1 handles search policy and context management
TurboAgents handles compressed retrieval and rerank
Chroma retrieves candidates while TurboAgents reranks or compresses the working set under that loop

Benchmarks Snapshot

Latest validated benchmark work:

Surface	Result
Chroma	`recall@1 = 1.0`, `recall@10 = 1.0` across the tested sweep in the local adapter benchmark
MLX sweep	`3.5` bits was the best current quality/performance tradeoff on `mlx-community/Llama-3.2-3B-Instruct-4bit`
FAISS	`recall@1 = 1.0`, `recall@10 = 1.0` across the tested sweep
LanceDB	`recall@10` landed in the `0.70` to `0.75` range on `medium-rag`
pgvector	`recall@10` improved monotonically up to `0.896875` at `4.0` bits
Needle	exact match held for insertion fraction `0.1`, but failed at `0.5` and `0.9`

For the full numbers, see:

Docs Map

For the shortest path through this repo:

docs/getting-started.md for install and first commands
docs/adapters.md for backend-specific retrieval surfaces
docs/examples.md for runnable local examples
docs/benchmarks.md for validated benchmark numbers
docs/status.md for what is implemented versus still incomplete

Current Status

structured quantization payloads with binary serialization
Fast Walsh-Hadamard rotation with cached sign patterns
PolarQuant-style spherical angle/radius stage
seeded QJL-style residual sketch
synthetic benchmark CLI with KV, RAG, and paper-style reports
real adapter surfaces for:
- Chroma
- FAISS
- LanceDB
- SurrealDB
- pgvector client adapter
- MLX runtime/server wrapper
- llama.cpp runtime wrapper
- experimental vLLM runtime wrapper
proxy/server baseline
lightweight examples and docs

Still not finished:

full paper-faithful production math
native engine kernels for llama.cpp / MLX / vLLM
larger benchmark datasets and stronger long-context evaluation
production-strength long-context quality claims

Install

uv add turboagents

Optional extras:

uv add "turboagents[mlx]"
uv add "turboagents[vllm]"
uv add "turboagents[rag]"
uv add "turboagents[all]"

For local development in this repository:

uv sync
uv sync --extra rag

CLI

turboagents doctor
turboagents bench kv --format json
turboagents bench rag --format markdown
turboagents bench paper
turboagents compress --input vectors.npy --output vectors.npz --head-dim 128
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run
turboagents serve --backend vllm --model meta-llama/Llama-3.1-8B-Instruct --dry-run

Examples

python3 examples/quickstart.py
python3 examples/bench_profiles.py
python3 examples/faiss_turborag.py
python3 examples/chroma_turborag.py
python3 examples/mlx_server_dry_run.py

Current Local Validation

cached MLX 3B smoke test succeeded on mlx-community/Llama-3.2-3B-Instruct-4bit
Chroma adapter smoke run succeeded on chromadb 1.5.5
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run python -m pytest -q passes

Development

Common local commands:

PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run python -m pytest -q
uv run mkdocs serve
uv build

Benchmark harness commands:

uv sync --extra rag --extra mlx
uv run python scripts/run_benchmark_matrix.py --output-dir benchmark-results/$(date +%Y%m%d-%H%M%S)
uv run python scripts/benchmark_needle.py --model mlx-community/Llama-3.2-3B-Instruct-4bit --context-tokens 2048 4096 8192 --output benchmark-results/needle-$(date +%Y%m%d-%H%M%S).json

Community and project health files:

Attribution

See ATTRIBUTION.md. This repository is not affiliated with Google Research.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0a2 pre-release

Mar 28, 2026

0.1.0a0 pre-release

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turboagents-0.1.0a2.tar.gz (3.6 MB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

turboagents-0.1.0a2-py3-none-any.whl (48.6 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file turboagents-0.1.0a2.tar.gz.

File metadata

Download URL: turboagents-0.1.0a2.tar.gz
Upload date: Mar 28, 2026
Size: 3.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for turboagents-0.1.0a2.tar.gz
Algorithm	Hash digest
SHA256	`2bb8addc3e2e0efe55eae8b95850ded2650e3189bc3e37488c83c639366b4b3e`
MD5	`2f9875464c9e4594027b9aab6196cf0b`
BLAKE2b-256	`d8525ff368c5ed0b08f1d2c265917d8ed907ecc2f04e9f3bc2dcf43a8e63fdda`

See more details on using hashes here.

File details

Details for the file turboagents-0.1.0a2-py3-none-any.whl.

File metadata

Download URL: turboagents-0.1.0a2-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 48.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for turboagents-0.1.0a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1a48ef2352cdabb7ff69e2e05fbcc3a37cb9377e29d2756adff2c3e7e8086cc5`
MD5	`dee65e17ea654ce7851de886cf1a87a2`
BLAKE2b-256	`d690130a2af22ad39a887ce80514f583f1282fe5884cddd652ab37780cbfb70d`

See more details on using hashes here.

turboagents 0.1.0a2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

turboagents

Why It Exists

At A Glance

Quick Proof Points

Fast Start

Reference Integration

What It Is

Who It Is For

How To Use It

1. Add It Under An Existing Agent Runtime

2. Add It Under An Existing RAG Stack

3. Use It As A Benchmark And Compression Tool

Usage Patterns

Existing Agent + MLX

Existing RAG + FAISS

Existing Vector Database

Chroma and Context-1

Benchmarks Snapshot

Docs Map

Current Status

Install

CLI

Examples

Current Local Validation

Development

Attribution

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes