Independent TurboQuant-style compression infrastructure for KV cache and RAG.
Project description
turboagents
Turbocharge AI Agents with TurboQuant
turboagents is a single Python package for TurboQuant-style KV-cache and vector
compression. It is being built as independent compression infrastructure that
can be used standalone and integrated into SuperOptix.
Repository: https://github.com/SuperagenticAI/turboagents
What It Is
turboagents is not an agent framework. It is the compression layer you put
under existing AI agents, inference engines, and RAG stacks so they can:
- hold longer contexts
- use less KV-cache memory
- store more embeddings at lower cost
- benchmark quality and memory tradeoffs explicitly
Think of it as:
TurboQuantfor real systemsTurboRAGfor vector retrieval stacks- adapters and tooling around existing engines instead of a replacement for them
Who It Is For
turboagents is for teams and developers who already have:
- AI agents that hit memory limits on long prompts
- RAG systems with large embedding stores
- inference stacks built on MLX, llama.cpp, vLLM, FAISS, LanceDB, SurrealDB, or pgvector
- agent frameworks that need compression infrastructure, not another framework
How To Use It
Most users should think about turboagents in three ways.
1. Add It Under An Existing Agent Runtime
If you already have an agent system, keep the agent layer and use turboagents
to improve the inference or memory layer under it.
Examples:
- use
turboagents.engines.mlxfor MLX-based local agents - use
turboagents.engines.llamacppto build llama.cpp runtime commands - use
turboagents.engines.vllmas an experimental runtime wrapper
2. Add It Under An Existing RAG Stack
If you already have retrieval, keep your current application logic and add TurboRAG where vectors are stored or searched.
Examples:
- use
TurboFAISSwhen you want a local FAISS-backed retrieval path - use
TurboLanceDBorTurboSurrealDBwhen you want a sidecar/rerank integration - use
TurboPgvectorwhen your application already depends on PostgreSQL
3. Use It As A Benchmark And Compression Tool
If you are still evaluating whether TurboQuant-style compression makes sense for your stack, use the CLI first:
turboagents doctorturboagents bench kvturboagents bench ragturboagents compress
That gives you a way to validate fit before deeper integration work.
Usage Patterns
Existing Agent + MLX
Use TurboAgents to build or validate the MLX runtime path, then keep your existing agent code on top.
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run
Existing RAG + FAISS
Use TurboRAG as the retrieval layer while keeping your current ingestion and agent orchestration logic.
from turboagents.rag import TurboFAISS
index = TurboFAISS(dim=128, bits=3.5, seed=0)
index.add(vectors)
results = index.search(query, k=5, rerank_top=16)
Existing Vector Database
If you already use a database-backed vector layer, TurboAgents should sit beside that store first, then move deeper only if it proves useful.
Examples:
TurboLanceDBfor LanceDB candidate search + TurboAgents rerankTurboSurrealDBfor SurrealDB candidate search + TurboAgents rerankTurboPgvectorfor PostgreSQL-backed storage and retrieval
Current Status
- structured quantization payloads with binary serialization
- Fast Walsh-Hadamard rotation with cached sign patterns
- PolarQuant-style spherical angle/radius stage
- seeded QJL-style residual sketch
- synthetic benchmark CLI with KV, RAG, and paper-style reports
- real adapter surfaces for:
- FAISS
- LanceDB
- SurrealDB
- pgvector client adapter
- MLX runtime/server wrapper
- llama.cpp runtime wrapper
- experimental vLLM runtime wrapper
- proxy/server baseline
- lightweight examples and docs
Still not finished:
- full paper-faithful production math
- native engine kernels for llama.cpp / MLX / vLLM
- live Postgres validation for pgvector on this machine
- large benchmark datasets and long-context benchmark matrix
Install
pip install -e .
Optional extras:
pip install -e ".[bench]"
pip install -e ".[docs]"
pip install -e ".[mlx]"
pip install -e ".[vllm]"
pip install -e ".[rag]"
pip install -e ".[all]"
CLI
turboagents doctor
turboagents bench kv --format json
turboagents bench rag --format markdown
turboagents bench paper
turboagents compress --input vectors.npy --output vectors.npz --head-dim 128
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run
turboagents serve --backend vllm --model meta-llama/Llama-3.1-8B-Instruct --dry-run
Examples
python3 examples/quickstart.py
python3 examples/bench_profiles.py
python3 examples/faiss_turborag.py
python3 examples/mlx_server_dry_run.py
Current Local Validation
- cached MLX 3B smoke test succeeded on
mlx-community/Llama-3.2-3B-Instruct-4bit PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest -qpasses
Attribution
See ATTRIBUTION.md. This repository is not affiliated with Google Research.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turboagents-0.1.0a0.tar.gz.
File metadata
- Download URL: turboagents-0.1.0a0.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1303028a888b5711c6e4d1a38367d67f2e8cf395ee3e107476b1639e60f8013
|
|
| MD5 |
9115a15029bc7fb5398ff94ace093f28
|
|
| BLAKE2b-256 |
cf1e7aa4f165398788a1cbbeebbb01e17d6b7c4143ecebd77a539d9c13a902a9
|
File details
Details for the file turboagents-0.1.0a0-py3-none-any.whl.
File metadata
- Download URL: turboagents-0.1.0a0-py3-none-any.whl
- Upload date:
- Size: 41.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79a388c99df32c5dd3c1a30c4be9309d10a87012d37501e9d3a4f357ea6ae9e6
|
|
| MD5 |
7e8b8f9eef9577ca5170b920da78eb43
|
|
| BLAKE2b-256 |
43ea1211babc0d6626c8ae6ee528eb51df12bbb7a9670640763cd439ac47d31c
|