Independent TurboQuant-style compression infrastructure for KV cache and RAG.

These details have not been verified by PyPI

Project links

Project description

turboagents

turboagents logo

Turbocharge AI Agents with TurboQuant

turboagents is a single Python package for TurboQuant-style KV-cache and vector compression. It is being built as independent compression infrastructure that can be used standalone and integrated into SuperOptix.

Repository: https://github.com/SuperagenticAI/turboagents

What It Is

turboagents is not an agent framework. It is the compression layer you put under existing AI agents, inference engines, and RAG stacks so they can:

hold longer contexts
use less KV-cache memory
store more embeddings at lower cost
benchmark quality and memory tradeoffs explicitly

Think of it as:

TurboQuant for real systems
TurboRAG for vector retrieval stacks
adapters and tooling around existing engines instead of a replacement for them

Who It Is For

turboagents is for teams and developers who already have:

AI agents that hit memory limits on long prompts
RAG systems with large embedding stores
inference stacks built on MLX, llama.cpp, vLLM, FAISS, LanceDB, SurrealDB, or pgvector
agent frameworks that need compression infrastructure, not another framework

How To Use It

Most users should think about turboagents in three ways.

1. Add It Under An Existing Agent Runtime

If you already have an agent system, keep the agent layer and use turboagents to improve the inference or memory layer under it.

Examples:

use turboagents.engines.mlx for MLX-based local agents
use turboagents.engines.llamacpp to build llama.cpp runtime commands
use turboagents.engines.vllm as an experimental runtime wrapper

2. Add It Under An Existing RAG Stack

If you already have retrieval, keep your current application logic and add TurboRAG where vectors are stored or searched.

Examples:

use TurboFAISS when you want a local FAISS-backed retrieval path
use TurboLanceDB or TurboSurrealDB when you want a sidecar/rerank integration
use TurboPgvector when your application already depends on PostgreSQL

3. Use It As A Benchmark And Compression Tool

If you are still evaluating whether TurboQuant-style compression makes sense for your stack, use the CLI first:

turboagents doctor
turboagents bench kv
turboagents bench rag
turboagents compress

That gives you a way to validate fit before deeper integration work.

Usage Patterns

Existing Agent + MLX

Use TurboAgents to build or validate the MLX runtime path, then keep your existing agent code on top.

turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run

Existing RAG + FAISS

Use TurboRAG as the retrieval layer while keeping your current ingestion and agent orchestration logic.

from turboagents.rag import TurboFAISS

index = TurboFAISS(dim=128, bits=3.5, seed=0)
index.add(vectors)
results = index.search(query, k=5, rerank_top=16)

Existing Vector Database

If you already use a database-backed vector layer, TurboAgents should sit beside that store first, then move deeper only if it proves useful.

Examples:

TurboLanceDB for LanceDB candidate search + TurboAgents rerank
TurboSurrealDB for SurrealDB candidate search + TurboAgents rerank
TurboPgvector for PostgreSQL-backed storage and retrieval

Current Status

structured quantization payloads with binary serialization
Fast Walsh-Hadamard rotation with cached sign patterns
PolarQuant-style spherical angle/radius stage
seeded QJL-style residual sketch
synthetic benchmark CLI with KV, RAG, and paper-style reports
real adapter surfaces for:
- FAISS
- LanceDB
- SurrealDB
- pgvector client adapter
- MLX runtime/server wrapper
- llama.cpp runtime wrapper
- experimental vLLM runtime wrapper
proxy/server baseline
lightweight examples and docs

Still not finished:

full paper-faithful production math
native engine kernels for llama.cpp / MLX / vLLM
live Postgres validation for pgvector on this machine
large benchmark datasets and long-context benchmark matrix

Install

pip install -e .

Optional extras:

pip install -e ".[bench]"
pip install -e ".[docs]"
pip install -e ".[mlx]"
pip install -e ".[vllm]"
pip install -e ".[rag]"
pip install -e ".[all]"

CLI

turboagents doctor
turboagents bench kv --format json
turboagents bench rag --format markdown
turboagents bench paper
turboagents compress --input vectors.npy --output vectors.npz --head-dim 128
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run
turboagents serve --backend vllm --model meta-llama/Llama-3.1-8B-Instruct --dry-run

Examples

python3 examples/quickstart.py
python3 examples/bench_profiles.py
python3 examples/faiss_turborag.py
python3 examples/mlx_server_dry_run.py

Current Local Validation

cached MLX 3B smoke test succeeded on mlx-community/Llama-3.2-3B-Instruct-4bit
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest -q passes

Attribution

See ATTRIBUTION.md. This repository is not affiliated with Google Research.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.0a2 pre-release

Mar 28, 2026

This version

0.1.0a0 pre-release

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turboagents-0.1.0a0.tar.gz (3.1 MB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

turboagents-0.1.0a0-py3-none-any.whl (41.8 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file turboagents-0.1.0a0.tar.gz.

File metadata

Download URL: turboagents-0.1.0a0.tar.gz
Upload date: Mar 26, 2026
Size: 3.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for turboagents-0.1.0a0.tar.gz
Algorithm	Hash digest
SHA256	`a1303028a888b5711c6e4d1a38367d67f2e8cf395ee3e107476b1639e60f8013`
MD5	`9115a15029bc7fb5398ff94ace093f28`
BLAKE2b-256	`cf1e7aa4f165398788a1cbbeebbb01e17d6b7c4143ecebd77a539d9c13a902a9`

See more details on using hashes here.

File details

Details for the file turboagents-0.1.0a0-py3-none-any.whl.

File metadata

Download URL: turboagents-0.1.0a0-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 41.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for turboagents-0.1.0a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79a388c99df32c5dd3c1a30c4be9309d10a87012d37501e9d3a4f357ea6ae9e6`
MD5	`7e8b8f9eef9577ca5170b920da78eb43`
BLAKE2b-256	`43ea1211babc0d6626c8ae6ee528eb51df12bbb7a9670640763cd439ac47d31c`

See more details on using hashes here.

turboagents 0.1.0a0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

turboagents

What It Is

Who It Is For

How To Use It

1. Add It Under An Existing Agent Runtime

2. Add It Under An Existing RAG Stack

3. Use It As A Benchmark And Compression Tool

Usage Patterns

Existing Agent + MLX

Existing RAG + FAISS

Existing Vector Database

Current Status

Install

CLI

Examples

Current Local Validation

Attribution

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes