Semantic caching layer for LLM applications — stop paying for the same call twice.

These details have not been verified by PyPI

Project links

Project description

ThriftLM

Semantic cache layer for LLM applications. Redis-fast exact hits. Numpy-powered near-miss matching. PII-scrubbed by default.

pip install thriftlm          # library only
pip install thriftlm[api]     # + dashboard server + Supabase backend

Overview

Every repeated or semantically similar LLM query burns tokens and adds latency. ThriftLM intercepts these calls with a three-tier cache — exact hash match in Redis, cosine similarity search in a local numpy index, and HNSW vector search in Supabase — before any request reaches your LLM provider.

73.5% hit rate at threshold=0.82 on the Quora Question Pairs benchmark. The median semantic cache hit returns in ~1ms vs. 2–12 seconds for a live LLM call.

How It Works

query
  │
  ▼
┌─────────────────┐     HIT → return instantly (~0.5ms)
│   Redis         │
│  (exact hash)   │
└────────┬────────┘
         │ MISS
         ▼
┌─────────────────┐     HIT → Supabase PK fetch → return (~50ms)
│  Local Numpy    │
│  Index (cosine) │
└────────┬────────┘
         │ MISS
         ▼
┌─────────────────┐
│   LLM Call      │     Your llm_fn() called here
│  (your function)│
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  PII Scrubbing  │     Presidio strips names, emails, phone numbers
│  (response only)│
└────────┬────────┘
         │
         ▼
   Store in Supabase + LocalIndex + Redis

Cache hit order:

Redis — exact embedding hash, microseconds, no DB call
Local numpy index — cosine similarity matmul, ~1ms, Supabase PK fetch for response
LLM — cache miss only, full latency, stored after Presidio scrub

Quickstart

Prerequisites

Python 3.10+
Supabase project with pgvector enabled
Redis (local via Docker or Upstash)

1. Install

pip install thriftlm          # library only
pip install thriftlm[api]     # also enables thriftlm serve + self-hosted backend

2. Set up Supabase

Run supabase/setup.sql in your Supabase SQL editor. It creates:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE cache_entries (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    api_key     TEXT NOT NULL,
    query       TEXT NOT NULL,
    response    TEXT NOT NULL,
    embedding   VECTOR(384) NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT now(),
    last_hit_at TIMESTAMPTZ,
    hit_count   INTEGER DEFAULT 0
);

CREATE INDEX cache_entries_embedding_idx
    ON cache_entries
    USING hnsw (embedding vector_cosine_ops);

Plus two RPC functions (match_cache_entries, increment_api_key_counters) — see the full file for those.

3. Configure environment

SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-anon-key
REDIS_URL=redis://localhost:6379

4. Run Redis

docker compose up -d

5. Integrate

from thriftlm import SemanticCache
import openai

# Initialize once per process
cache = SemanticCache(threshold=0.85, api_key="your-key")

def call_llm(query: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": query}]
    )
    return response.choices[0].message.content

# Drop-in wrapper — handles cache check + LLM fallback automatically
response = cache.get_or_call("Explain semantic caching", call_llm)

# Near-duplicate → instant cache hit, no LLM called
response2 = cache.get_or_call("What is semantic caching?", call_llm)

6. View your metrics

thriftlm serve --api-key your-key
# → ThriftLM dashboard → http://localhost:8000
# → Opens browser automatically

Requires pip install thriftlm[api]. See Local Dashboard below.

Local Dashboard (`thriftlm serve`)

thriftlm serve starts a local FastAPI server at localhost:8000 that serves a live metrics dashboard and reads directly from your own Supabase — no hosted service, no external dependency.

                  ┌──────────────────────────────┐
your browser  →   │  thriftlm serve (localhost)  │
                  │  GET /  → dashboard.html      │
                  │  GET /metrics → Supabase query│
                  └──────────────┬───────────────┘
                                 │ direct SQL
                                 ▼
                          your Supabase
                         (api_keys table +
                          cache_entries table)

Usage:

# Start dashboard, auto-opens http://localhost:8000
thriftlm serve --api-key sc_xxx

# Custom port
thriftlm serve --api-key sc_xxx --port 9000

# Bind to all interfaces (LAN access)
thriftlm serve --api-key sc_xxx --host 0.0.0.0 --port 8080

# Skip auto-open
thriftlm serve --api-key sc_xxx --no-browser

What it shows — updates every 30 seconds:

Hit rate (%) and total queries
Tokens saved and estimated cost saved ($0.002/1K tokens blended)
Top 5 most-hit cached queries with timestamps

How the key works: The key you pass to --api-key is the same api_key you used in SemanticCache(api_key="..."). It namespaces your cache in Supabase and authenticates the /metrics endpoint — no separate key management needed.

Self-hosted API Backend (`api/`)

The api/ directory is a multi-tenant FastAPI backend for teams that want to centralize caching across multiple services. Clients call /lookup and /store instead of connecting to Supabase directly.

client app  →  POST /lookup  →  api/ backend  →  Supabase
                                               →  Redis

Run locally

pip install thriftlm[api]
uvicorn api.main:app --reload

Endpoints

POST /lookup    { "embedding": [...], "api_key": "..." }
                → { "response": "..." } or null

POST /store     { "embedding": [...], "query": "...", "response": "...", "api_key": "..." }
                → 200 OK

GET  /metrics   header: X-API-Key
                → { "hit_rate", "tokens_saved", "cost_saved", "total_queries" }

POST /keys      { "email": "..." }
                → { "api_key": "sc_..." }

GET  /health    → { "status": "ok" }

GET  /          → landing page

Difference from `thriftlm serve`

	`thriftlm serve`	`api/` backend
Purpose	Personal metrics dashboard	Centralized cache for your apps
Who runs it	Developer, locally	DevOps, on a server
Client	Your browser	Your application code
Supabase access	Direct from server	Direct from server
Auth	CLI `--api-key` arg	`api_keys` table in Supabase

Configuration

Parameter	Default	Description
`threshold`	`0.85`	Cosine similarity cutoff. Lower = more aggressive matching.
`api_key`	required	Namespaces cache per tenant. Each key has its own LocalIndex.

Threshold guide:

Threshold	Hit Rate (QQP)	Use case
0.70	92.5%	Aggressive — high savings, some false positives
0.82	73.5%	Balanced — recommended for most apps
0.85	62.5%	Default — conservative
0.90	40.0%	Near-exact only

Architecture

Embedding: all-MiniLM-L6-v2 (384-dim). Runs locally, no API cost.

Local numpy index: On SemanticCache() init, all stored embeddings are bulk-fetched into a (N, 384) float32 matrix. Cosine similarity is a single matrix @ query_vec matmul — ~1ms regardless of cache size. New entries append via np.vstack.

Supabase HNSW: pgvector with HNSW index for accurate ANN at scale. Used for cold-start loading and as fallback.

PII scrubbing: Presidio + spaCy en_core_web_lg. Applied to LLM responses only before storage. Queries are not scrubbed — scrubbing before embedding causes embedding drift and kills recall.

Benchmark

200 duplicate question pairs from Quora Question Pairs.

Threshold | Hit Rate | Hits / 200
----------|----------|------------
0.70      |  92.5%   |   185
0.75      |  86.0%   |   172
0.80      |  78.0%   |   156
0.82 ←    |  73.5%   |   147   (recommended)
0.85      |  62.5%   |   125   (default)
0.90      |  40.0%   |    80

Model: all-MiniLM-L6-v2 · Index: HNSW (Supabase pgvector)
Dataset: mean sim=0.859, min=0.550, max=0.999

Project Structure

ThriftLM/
├── thriftlm/                    # pip package
│   ├── __init__.py              # Public API: SemanticCache
│   ├── cache.py                 # Core lookup/store logic
│   ├── cli.py                   # thriftlm serve CLI entry point
│   ├── _server.py               # FastAPI app for thriftlm serve (localhost)
│   ├── config.py                # Env config
│   ├── embedder.py              # SBERT wrapper
│   ├── privacy.py               # Presidio PII scrubbing
│   ├── static/
│   │   └── dashboard.html       # Metrics dashboard (pip-bundled)
│   └── backends/
│       ├── local_index.py       # Numpy cosine index
│       ├── redis_backend.py     # Exact hash cache
│       └── supabase_backend.py  # Vector storage + PK fetch
├── api/                         # Self-hosted multi-tenant backend
│   ├── main.py                  # FastAPI app
│   ├── auth.py                  # API key auth
│   └── routes/
│       ├── cache.py             # /lookup, /store
│       ├── metrics.py           # /metrics
│       └── keys.py              # /keys
├── docs/
│   └── index.html               # Landing page (GitHub Pages + api/ GET /)
├── tests/                       # 69 passing tests
├── scratch/
│   ├── smoke_test.py
│   ├── openai_test.py
│   ├── populate_test.py
│   └── qqp_benchmark.py
├── supabase/setup.sql
├── docker-compose.yml
└── pyproject.toml

Development

git clone https://github.com/samujure/ThriftLM
cd ThriftLM
pip install -e ".[dev,api]"
cp .env.example .env
docker compose up -d
pytest tests/ -v
python scratch/smoke_test.py
python scratch/qqp_benchmark.py

Roadmap

V1 — Shipped ✓

Three-tier cache: Redis → LocalIndex → HNSW
Presidio PII scrubbing on responses
Multi-tenant api/ FastAPI backend with API key auth
thriftlm serve — bundled local dashboard CLI
pip install thriftlm

V2 — coming soon

Context caching

License

MIT

Built by Srivamsi Amujure & Ivan Thomas Shen

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Mar 31, 2026

0.2.1

Mar 31, 2026

0.2.0

Mar 30, 2026

This version

0.1.6

Mar 8, 2026

0.1.5

Mar 8, 2026

0.1.4

Mar 8, 2026

0.1.3

Mar 8, 2026

0.1.2

Mar 8, 2026

0.1.1

Mar 8, 2026

0.1.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thriftlm-0.1.6.tar.gz (59.9 kB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

thriftlm-0.1.6-py3-none-any.whl (32.8 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file thriftlm-0.1.6.tar.gz.

File metadata

Download URL: thriftlm-0.1.6.tar.gz
Upload date: Mar 8, 2026
Size: 59.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for thriftlm-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`0a3fe646770f561a37c2217b57a94c798936fdbe9d6a17359dc42a539a251256`
MD5	`863c155b805040f743a5417082f63d33`
BLAKE2b-256	`bbc17b04a8fb773c9cdeff4bc501e06dead3c4edd3687febf4af9482ed916631`

See more details on using hashes here.

File details

Details for the file thriftlm-0.1.6-py3-none-any.whl.

File metadata

Download URL: thriftlm-0.1.6-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 32.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for thriftlm-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf854e4d9c4f224fc27cf8f6375a37fd6cc07a8914dc53bc756d37e9261a821b`
MD5	`5f7c0fef73f0476d82a2aef08842ebf8`
BLAKE2b-256	`ee642bca843485cebd5f261ffca6da7c15dbd1b02f19eb295d2c3fe8fe2f5dfb`

See more details on using hashes here.

thriftlm 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ThriftLM

Overview

How It Works

Quickstart

Prerequisites

1. Install

2. Set up Supabase

3. Configure environment

4. Run Redis

5. Integrate

6. View your metrics

Local Dashboard (thriftlm serve)

Self-hosted API Backend (api/)

Run locally

Endpoints

Difference from thriftlm serve

Configuration

Architecture

Benchmark

Project Structure

Development

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Local Dashboard (`thriftlm serve`)

Self-hosted API Backend (`api/`)

Difference from `thriftlm serve`