Skip to main content

Near-optimal online vector quantization for OpenClaw context compression

Project description

openclaw-turboquant

English | 简体中文

Near-optimal online vector quantization for OpenClaw context compression, based on the TurboQuant algorithm from Google Research (arXiv:2504.19874).

Status

Component Status Description
Library API ✅ Ready Core quantization algorithms fully implemented
CLI ✅ Ready benchmark, compress, retrieve commands available
Agent Skill ✅ Ready CLI commands can be used independently by agents
Context Engine Plugin 🚧 WIP Interface defined, core integration logic not yet implemented

Overview

TurboQuant achieves near-optimal distortion (within ~2.7× of the information-theoretic lower bound) using a simple two-stage pipeline:

  1. Random Rotation — Apply a random orthogonal matrix (Haar measure via QR decomposition) to spread information uniformly across coordinates.
  2. Scalar Quantization — Quantize each rotated coordinate independently using a Lloyd-Max codebook optimized for the Beta distribution of coordinates on the unit hypersphere.

Two quantization modes are provided:

Mode Use Case Description
MSE Reconstruction Minimizes mean squared error via Lloyd-Max scalar quantization at b bits per coordinate
Product Inner-product estimation Uses MSE at (b−1) bits + 1-bit QJL (Quantized Johnson-Lindenstrauss) on the residual for unbiased inner-product estimation

Installation

Requires Python ≥ 3.13 and uv.

# Clone the repository
git clone https://github.com/openclaw/openclaw-turboquant.git
cd openclaw-turboquant

# Install with uv
uv sync

Quick Start

Library API

import numpy as np
from openclaw_turboquant import TurboQuantMSE, TurboQuantProd

# MSE quantization (for reconstruction)
mse_q = TurboQuantMSE(dim=128, bit_width=4, seed=42)
x = np.random.randn(128)
compressed = mse_q.quantize(x)
reconstructed = mse_q.dequantize(compressed)

# Inner-product quantization
prod_q = TurboQuantProd(dim=128, bit_width=4, seed=42)
x, y = np.random.randn(128), np.random.randn(128)
cx, cy = prod_q.quantize(x), prod_q.quantize(y)
ip_estimate = prod_q.estimate_inner_product(cx, cy)

Context Store (OpenClaw Integration)

from openclaw_turboquant.context_engine import ContextStore

store = ContextStore(dim=128, bit_width=4, seed=42)
store.ingest("key1", embedding, "Some text content", metadata={"source": "doc.md"})

# Retrieve top-k similar entries
results = store.retrieve_top_k(query_embedding, k=5)

# Assemble context within token budget
context = store.assemble_context(query_embedding, max_tokens=4096)

# Compact the store (keep 50% most relevant entries)
store.compact(keep_ratio=0.5, query_embedding=query_embedding)

CLI

# Run benchmarks
openclaw-turboquant benchmark --dim 128 --bits 4

# Compress vectors from a .npy file
openclaw-turboquant compress --input vectors.npy --output compressed.npz --bits 4

# Retrieve similar vectors
openclaw-turboquant retrieve --store compressed.npz --query query.npy --top-k 5

OpenClaw Integration

Context Engine Plugin (WIP)

Note: The plugin interface is defined but the core integration logic (embedding API calls, Python CLI bridge) is not yet implemented. Contributions welcome!

The plugin/ directory contains a Context Engine plugin that compresses embeddings during the ingest → assemble → compact → afterTurn lifecycle:

  • plugin/openclaw.plugin.json — Plugin manifest (kind: context-engine)
  • plugin/index.ts — TypeScript entry point registering the turboquant-engine

Configuration options (via plugin settings):

Parameter Default Description
bitWidth 4 Bits per coordinate (1–8)
embeddingDim 128 Vector dimension
topK 10 Number of results for retrieval
compactKeepRatio 0.5 Fraction of entries kept during compaction

AgentSkills

The skills/turboquant/SKILL.md skill provides AI agents with instructions for using the TurboQuant CLI and library API.

Algorithm Details

Lloyd-Max Codebook

After random rotation, each coordinate follows a Beta distribution:

$$f(x; d) = \frac{\Gamma(d/2)}{\Gamma(1/2),\Gamma((d-1)/2)} \cdot (1 - x^2)^{(d-3)/2}, \quad x \in [-1, 1]$$

The Lloyd-Max algorithm iteratively optimizes codebook centroids and decision boundaries to minimize expected distortion under this distribution.

QJL Transform

For inner-product estimation, TurboQuant uses a 1-bit Quantized Johnson-Lindenstrauss projection:

$$\hat{z} = \text{sign}(S \cdot x)$$

where $S$ is a random Gaussian projection matrix. Combined with the MSE residual, this yields an unbiased estimator: $\mathbb{E}[\langle \hat{x}, \hat{y} \rangle] = \langle x, y \rangle$.

Benchmarks

Run with uv run pytest benchmarks/ --benchmark-only:

Operation Dimension Mean
MSE quantize 64 ~4.6 µs
MSE dequantize 64 ~1.2 µs
MSE batch (100 vectors) 64 ~473 µs
MSE quantize 256 ~9.4 µs
Product quantize 64 ~11 µs
Product dequantize 64 ~3.6 µs
Product inner product 64 ~4.1 µs
QJL quantize 64 ~2.4 µs
QJL dequantize 64 ~1.2 µs
Context Store ingest 64 ~12 µs
Context Store retrieve (100 entries) 64 ~406 µs

Development

# Run tests
uv run pytest

# Run benchmarks
uv run pytest benchmarks/ --benchmark-only -v

# Lint & format
uv run ruff check src/ tests/ benchmarks/
uv run ruff format src/ tests/ benchmarks/

# Type check
uv run mypy src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openclaw_turboquant-0.1.0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openclaw_turboquant-0.1.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file openclaw_turboquant-0.1.0.tar.gz.

File metadata

  • Download URL: openclaw_turboquant-0.1.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openclaw_turboquant-0.1.0.tar.gz
Algorithm Hash digest
SHA256 511c424b93341080885500d69f48aa4f9149b2d81b5352a9e76c33ea46f59784
MD5 e9422a10ee09481ae3f4b06c8fc4ca53
BLAKE2b-256 ed2e9154c8ce2c1433d91652bc18930e2a65ceb559e4fe7a6ebcfccb04cddbcb

See more details on using hashes here.

File details

Details for the file openclaw_turboquant-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openclaw_turboquant-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openclaw_turboquant-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c9214ed1d196ebbcb6cbeaccf3f26afe398df175de429a723a67b144e464717
MD5 966526c93b78906ff19575247a2ddcd2
BLAKE2b-256 3937cf72a9e45138f586ac0f1f7f9e04d5f30f4493ddc2f7ad6001a9a97bcd55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page