Skip to main content

Tensor-native semantic cache and distributed data plane — PrismLib

Project description

PrismLib

PyPI version Python 3.11+ License: Apache 2.0 GitHub

Tensor-native semantic cache and distributed data plane.

Two products, one mathematical core:

Product Component Deployed on Install
PrismCache In-process LLM cache App node pip install "prismlib[cache]"
PrismDriver Server Wrapper (daemon) DB node pip install "prismlib[wrapper]"
PrismDriver DLL Driver (in-process) App node pip install "prismlib[fabric]"

PrismDriver is a two-node system: the Server Wrapper runs as an OS daemon on the same machine as your database, intercepts WAL/binlog changes, vectorizes rows, and streams them over CHORUS Fabric to the DLL Driver on your app server. The driver keeps a local PrismResonance index warm so reads never leave the process.

Built on two open-source InsightIts libraries:

  • PrismResonance — the wave-memory similarity engine powering every cache lookup and local vector index
  • CHORUS Fabric — the gRPC binary streaming protocol that carries encrypted float32 tensor frames from the Server Wrapper to the DLL Driver

Installation

# Semantic LLM cache only
pip install "prismlib[cache]"

# With OpenAI embeddings
pip install "prismlib[cache,cache-openai]"

# With Anthropic/Voyage embeddings
pip install "prismlib[cache,cache-anthropic]"

# With Ollama (local models)
pip install "prismlib[cache,cache-ollama]"

# DB driver (app node)
pip install "prismlib[fabric]"

# Server Wrapper daemon (DB node — Linux/macOS)
pip install "prismlib[wrapper]"
prism-wrapper --config /etc/prism/wrapper.toml

# Everything
pip install "prismlib[all]"

Use Cases

PrismCache

Drop-in LLM response cache

Save 60-80% of LLM API calls by serving semantically identical queries from cache. Paraphrases hit the cache — "How do I reset my password?" and "I forgot my password, help" return the same answer without a second LLM call.

from prism.cache import PrismCache

cache = PrismCache.build(tenant_id="my-app", llm_model="gpt-4o")

def ask(question: str) -> str:
    return cache.get_or_call(
        query=question,
        call_fn=lambda: openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": question}],
        ).choices[0].message.content,
    )

Multi-tenant SaaS — isolated caches per customer

Each tenant gets a mathematically isolated cache space (JL projection seeded by tenant ID). One customer's cached answers never bleed into another's.

from prism.cache import PrismCache

def get_cache(tenant_id: str) -> PrismCache:
    return PrismCache.build(tenant_id=tenant_id, llm_model="gpt-4o-mini")

# Tenant A and tenant B share no cache state
cache_a = get_cache("acme-corp")
cache_b = get_cache("globex-inc")

answer = cache_a.get_or_call(query="What is my plan limit?", call_fn=llm_call)

FastAPI / Django middleware — transparent caching

Wrap your existing LLM endpoint without changing any business logic.

# FastAPI
from fastapi import FastAPI, Request
from prism.cache import PrismCache

app = FastAPI()
cache = PrismCache.build(tenant_id="api", llm_model="gpt-4o")

@app.post("/chat")
async def chat(request: Request):
    body = await request.json()
    question = body["message"]
    answer = await cache.aget_or_call(
        query=question,
        call_fn=lambda: llm_client.ask(question),
    )
    return {"answer": answer}
# Django — add to MIDDLEWARE in settings.py
# prism/middleware.py
from prism.cache import PrismCache

_cache = PrismCache.build(tenant_id="django-app", llm_model="gpt-4o")

class PrismCacheMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        return self.get_response(request)

    def process_llm_query(self, question: str, call_fn) -> str:
        return _cache.get_or_call(query=question, call_fn=call_fn)

Async batch queries

import asyncio
from prism.cache import PrismCache

cache = PrismCache.build(tenant_id="batch", llm_model="gpt-4o-mini")

async def process_batch(questions: list[str]) -> list[str]:
    tasks = [
        cache.aget_or_call(query=q, call_fn=lambda q=q: llm_call(q))
        for q in questions
    ]
    return await asyncio.gather(*tasks)

Cost estimation

from prism.cache import PrismCache

cache = PrismCache.build(tenant_id="finance", llm_model="gpt-4o")

# After processing queries...
metrics = cache.metrics()
print(f"Hit rate:          {metrics.hit_rate:.0%}")
print(f"Tokens saved:      {metrics.tokens_saved:,}")
print(f"Cost saved today:  ${metrics.cost_saved_usd:.2f}")
print(f"Projected monthly: ${metrics.cost_saved_usd * 30:.0f}")

PrismDriver

PrismDriver has two components that work together. Install each on the right machine.

On the DB node — Server Wrapper

The Server Wrapper is an OS daemon that sits next to your database. It reads WAL/binlog changes, vectorizes rows using RowVectorizer, encrypts them with TensorCipher (via CHORUS Fabric), and streams float32 frames to every connected DLL Driver.

# Install on the DB node (Linux or macOS)
pip install "prismlib[wrapper]"

# Configure and start
prism-wrapper --config /etc/prism/wrapper.toml
# /etc/prism/wrapper.toml
[database]
flavor = "postgresql"
dsn = "postgresql://user:pass@localhost/mydb"

[chorus]
listen_port = 50051
tenant_id = "products-service"

Supported databases: PostgreSQL (WAL / wal2json), MySQL (binlog), CockroachDB (EXPERIMENTAL CHANGEFEED), TiDB (push model).

On the app node — DLL Driver

The DLL Driver is an in-process library that replaces your DB connection string. On startup it connects to the Server Wrapper, subscribes to the CHORUS Fabric stream, and keeps a local PrismResonance index warm. All reads hit the in-process index — no network round-trip, sub-millisecond latency.

# Install on the app node
pip install "prismlib[fabric]"

Replace your DB connection string

# Before
import psycopg2
conn = psycopg2.connect("postgresql://user:secret@db-host:5432/mydb")

# After — no password, no hostname in app config
from prism.ffi import PrismDriver, DriverConfig

async with PrismDriver(DriverConfig(wrapper_host="db-proxy-1")) as driver:
    results = await driver.query(
        embedding=my_embedding_vector,
        top_k=5,
        threshold=0.85,
    )

Sub-millisecond row lookups via local cache

The driver keeps a local PrismResonance cache warm via a background WAL subscription. Reads never touch the DB — they hit the in-process float32 index.

from prism.ffi import PrismDriver, DriverConfig
import numpy as np

config = DriverConfig(
    wrapper_host="10.0.1.50",
    wrapper_port=50051,
    tenant_id="products-service",
)

async with PrismDriver(config) as driver:
    # Typical hit: < 1ms, no network round-trip
    query_vec = np.array([...], dtype=np.float32)
    matches = await driver.query(embedding=query_vec, top_k=10)
    for m in matches:
        print(f"{m.row_id}  score={m.score:.3f}  {m.text_repr}")

Write through to DB

async with PrismDriver(config) as driver:
    ack = await driver.write(
        row_id="product-42",
        data={"name": "Widget Pro", "price": 29.99, "stock": 150},
    )
    print(f"Written: event_id={ack.event_id}")

Go, C#, PHP, Java — same DLL, native bindings

// Go
import prism "github.com/insightitsGit/prismlib/go"

driver, _ := prism.Connect("db-proxy-1:50051", "my-tenant")
defer driver.Close()
results, _ := driver.Query(embedding, prism.QueryOpts{TopK: 5, Threshold: 0.85})
// C#
using InsightIts.Prism;

await using var driver = new PrismDriver("db-proxy-1:50051", tenantId: "my-tenant");
await driver.ConnectAsync();
var results = await driver.QueryAsync(embedding, topK: 5, threshold: 0.85f);
// PHP 8.0+
$driver = new PrismDriver('db-proxy-1', 50051, 'my-tenant');
$driver->connect();
$results = $driver->query($embedding, topK: 5, threshold: 0.85);

Architecture

┌─ DB Node ──────────────────────────────────────────────────────┐
│  PostgreSQL / MySQL / CockroachDB / TiDB                       │
│       │ WAL / binlog / changefeed                              │
│  ┌────▼────────────────────────────────────────────────────┐   │
│  │  prism-wrapper  (pip install "prismlib[wrapper]")       │   │
│  │  RowVectorizer → TensorCipher (V_enc = V @ K)          │   │
│  │  → HMAC-SHA256 watermark → CHORUSPublisher             │   │
│  └────────────────────────┬────────────────────────────────┘   │
└───────────────────────────┼────────────────────────────────────┘
                            │  CHORUS Fabric  (gRPC, port 50051)
                            │  encrypted float32 frames
┌─ App Node ────────────────┼────────────────────────────────────┐
│  ┌────────────────────────▼───────────────────────────────┐    │
│  │  PrismDriver DLL  (pip install "prismlib[fabric]")     │    │
│  │  Subscribe loop → decrypt → PrismResonance index       │    │
│  └──────────────────────────────────────────┬─────────────┘    │
│                                             │ sub-ms query     │
│  ┌──────────────────────────────────────────▼─────────────┐    │
│  │  Your Application                                       │    │
│  │  ┌──────────────────┐   ┌───────────────────────────┐  │    │
│  │  │  PrismCache      │   │  PrismDriver              │  │    │
│  │  │  LLM cache       │   │  local PrismResonance     │  │    │
│  │  │  pip install     │   │  (no DB round-trip)       │  │    │
│  │  │  prismlib[cache] │   │                           │  │    │
│  │  └──────────────────┘   └───────────────────────────┘  │    │
│  └─────────────────────────────────────────────────────────┘    │
└────────────────────────────────────────────────────────────────┘

Benchmark

PrismCache — semantic LLM cache

Live results from Azure Container App (westus2, 1 vCPU / 2 GiB, mock LLM baseline):

Scenario Users Duration Hit rate Queries Tokens saved Monthly est.
Light 20 60s 91.0% 5,936 1,374,464 $594
Mixed 50 300s 95.9% 6,973 1,673,216 $723

Numbers use a mock LLM (80ms sleep). With real GPT-4o calls (1–3s), latency speedup is 4–13×; token savings are identical.

PrismDriver — two-node baseline vs local index

Live two-node benchmark (Azure Container Apps westus2, 30 users × 60s per phase):

Phase Path Avg latency Queries
Baseline (no driver) App → DB node, network 142.8 ms 3,864
Driver (local index) App → in-process PrismResonance 2.0 ms 1,479

70.7× faster · 98.6% latency reduction

The 98.6% reduction is a direct result of CHORUS Fabric doing its job. The subscription loop streamed 11,000 rows at 26,000 rows/s from the DB node into the local PrismResonance index before the load test began. By the time the first /driver/query hit arrived, there were zero network hops — the answer was already in-process. This is what CHORUS Fabric was designed for: getting tensor data to where the query is, before the query arrives.

# Two-node benchmark (requires both container apps running)
python benchmark/load/run_driver_benchmark.py \
  --app-url https://prism-benchmark.nicestone-720c6a9b.westus2.azurecontainerapps.io \
  --db-url  https://prism-wrapper-sim.nicestone-720c6a9b.westus2.azurecontainerapps.io \
  --users 30 --duration 60

# PrismCache load test
python benchmark/load/run_benchmark.py \
  --host https://prism-benchmark.nicestone-720c6a9b.westus2.azurecontainerapps.io \
  --scenario mixed

See benchmark/ for full results JSON, Locust CSV files, and the Azure deploy script.


Core libraries

PrismLib is built on two InsightIts open-source libraries. You can use them directly if you need lower-level access.

PrismResonance

github.com/insightitsGit/prismresonance · pip install prismresonance

The wave-memory similarity engine. Every cache lookup and local vector index in PrismLib goes through PrismResonance.

How it works:

  • Receives a float32 embedding vector
  • Johnson-Lindenstrauss reduces it to 64 dimensions using a projection matrix seeded by SHA-256(tenant_id) — this is what gives each tenant mathematically isolated address space
  • Computes similarity as wave interference (cosine in projected space) in three lock-free phases: snapshot → ONNX MatMul → rank
  • Returns ranked candidates in sub-millisecond time entirely in-process

PrismCache wraps this for LLM response caching. PrismDriver's local replica is a PrismResonance index kept warm by WAL streaming.

from prismresonance import PrismProjector, WaveIndex

projector = PrismProjector(dim=64, tenant_id="my-tenant")
index = WaveIndex(projector)

index.add(vector=my_embedding, payload={"row_id": "product-1", "text": "Widget"})
results = index.query(vector=query_embedding, top_k=5, threshold=0.85)

CHORUS Fabric

github.com/insightitsGit/chorus_fabric · pip install chorus-fabric

The secure gRPC binary streaming protocol for machine-to-machine tensor communication. PrismDriver uses CHORUS Fabric as its transport layer between the server wrapper on the DB node and the DLL driver on the app node.

How it works:

  • prism-wrapper (DB node) vectorizes WAL row events via RowVectorizer, encrypts them with TensorCipher (V_enc = V @ K), appends an HMAC-SHA256 watermark, and publishes batches of raw float32 frames
  • PrismDriver (app node) opens a persistent WrapperService.Subscribe() gRPC stream, receives encrypted frames, decrypts, and feeds them into the local PrismResonance index
  • Transport is pure binary float32 over gRPC server-streaming — no JSON serialization, no REST overhead
  • The WrapperService proto also exposes Query, Write, Health, and Hello RPCs for direct interaction
from chorus_fabric import CHORUSPublisher, DriverEndpoint

publisher = CHORUSPublisher(config)
publisher.add_driver(DriverEndpoint(host="10.0.1.50", port=50051, tenant_id="prod"))
await publisher.run(event_queue)  # streams WAL events to all connected drivers

CHORUS Fabric is the same protocol used in the CHORUS M2M system — InsightIts' 4-container gRPC topology for tensor communication between AI agents. The 98.6% latency reduction in the PrismDriver benchmark is direct proof that the protocol works at production scale: 11,000 rows streamed at 26,000 rows/s across Azure inter-container networking, then served locally at 2ms.


PrismLib Micro — Cluster & RAG Layer (v0.4.0)

PrismLib Micro is the cluster layer built into prismlib[fabric]. It adds three capabilities on top of the single-node stack — no extra install, no extra infra.

What's included

Component What it does
ClusterCache Shares LLM answers across all nodes via CHORUS TOKEN_SYNC frames. Once any node answers a query, every other node serves it for 0 tokens.
AlertManager Broadcasts health alerts as SIGNAL frames + admin email the moment CPU/RAM/disk/latency thresholds are crossed. No Prometheus. No Datadog.
Blue/Green/Orange failover Three-tier hot-standby: GREEN (active), BLUE (warm standby, auto-promotes in ~3s), ORANGE (syncing reserve). No Raft dependency. No K8s operator.
ContextCompressor Ranks RAG context chunks by cosine similarity, keeps top-K. Saves 58–64% of context tokens before every LLM call. In-process, no extra model.

Cluster benchmark results (3-node, live run)

Metric Result
Token savings — cluster avg 76.1%
BLUE node (cluster cache hit) 100% — 0 LLM calls
ORANGE node (cross-network cache hit) 100% — 0 LLM calls
Context compression 58–64% per query
Health alert propagation <1 s (709–711 ms measured)
Failover — BLUE promoted to GREEN ~3–4 s, no human step

See benchmark/cluster/ for the full benchmark code and benchmark/cluster/cluster_benchmark_results.json for raw results.

ClusterCache — 5-line RAG integration

from prism.cluster.cache import ClusterCache

cache = ClusterCache(node_id="node-1", fabric=chorus_fabric)

answer = await cache.get_or_call(
    query          = user_question,
    query_vector   = embed(user_question),
    call_fn        = lambda: llm.complete(user_question),
    context_chunks = retrieved_docs,    # your RAG chunks
    chunk_vectors  = doc_embeddings,    # their vectors
)

Drop this in front of your existing retrieve → generate step. No changes to retrieval logic, no changes to your LLM client.

AlertManager — email + SIGNAL frame on health threshold

from prism.cluster.alerts import AlertManager, SMTPConfig

alerts = AlertManager(
    fabric = chorus_fabric,
    mail_config = SMTPConfig(
        host="smtp.gmail.com", port=587,
        username="you@gmail.com",
        password=os.getenv("GMAIL_APP_PASS"),
        recipients=["admin@yourcompany.com"],
    ),
)
await alerts.evaluate_health(health_snapshot)
# Fires email + SIGNAL frame to all nodes if any of the 12 default rules trigger

Competitive position

Capability PrismLib Micro Prometheus + Alertmanager Redis cluster Raft / etcd
Cross-node token cache Yes, built-in No Manual (exact match) No
Alert propagation <1 s, no infra 30–60 s, stack needed No No
Auto failover ~3–4 s, built-in No Sentinel, 2–30 s 150–500 ms
Context compression 58–64%, free No No No
Extra infrastructure None Prometheus stack Redis cluster etcd cluster

Pricing

Tier Nodes Price Includes
Open source Unlimited Free forever All cluster code, Apache 2.0
ChorusMesh Developer (coming soon) Up to 3 $29/mo after 30-day trial ClusterCache + failover + AlertManager
ChorusMesh Team Up to 10 $149/mo + Raft consensus, message broker adapters
ChorusMesh Business Up to 50 $499/mo + multi-region routing, SLA 99.9%
Enterprise Unlimited Contact us + air-gap, compliance, dedicated Slack

For enterprise agreements: insightits.info@gmail.com


Enterprise

PrismLib is open source (Apache 2.0) and free to use. If your team needs any of the following, contact us for enterprise pricing:

  • On-premises deployment support — air-gapped installs, hardened Docker images, SOC 2 documentation
  • SLA-backed support — guaranteed response times, incident escalation, dedicated Slack channel
  • Custom embedding model integration — fine-tuned domain-specific embedders for higher hit rates in specialized domains (legal, medical, finance, code)
  • Multi-region CHORUS Fabric topology — active-active DB node clusters, cross-region WAL fan-out, geo-aware driver routing
  • Audit logging and compliance exports — per-query access logs, tenant isolation attestation reports, GDPR data lineage
  • Professional services — architecture review, migration from Redis/GPTCache, custom RowVectorizer schemas

Contact: insightits.info@gmail.com GitHub: github.com/insightitsGit/prismlib


Sponsors

PrismLib is free and will stay free. If it saved your team money on OpenAI bills or database infrastructure, consider sponsoring — it covers benchmark compute, maintenance time, and keeps development moving.

Sponsor on GitHub

Your name or logo here — become a sponsor


Publishing to PyPI

It is one packageprismlib — published once. The wrapper, driver, and cache are all extras of the same package. Users install what they need:

pip install "prismlib[cache]"           # PrismCache only
pip install "prismlib[wrapper]"         # Server Wrapper (DB node)
pip install "prismlib[fabric]"          # DLL Driver (App node)
pip install "prismlib[all]"             # Everything

To publish a new version:

# 1. Bump version in pyproject.toml (currently 0.4.0)
# 2. Build the distribution
pip install build twine
python -m build

# 3. Upload to PyPI (use your token from pypi.org/manage/account/token/)
python -m twine upload dist/* --username __token__ --password pypi-YOUR_TOKEN

That's it. One upload covers all three install variants — PyPI resolves the extras automatically.


License

Apache 2.0 — InsightIts © 2026

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prismlib-0.4.0.tar.gz (106.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prismlib-0.4.0-py3-none-any.whl (93.9 kB view details)

Uploaded Python 3

File details

Details for the file prismlib-0.4.0.tar.gz.

File metadata

  • Download URL: prismlib-0.4.0.tar.gz
  • Upload date:
  • Size: 106.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for prismlib-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2de19c39890a526c1c57d4da6629cb0af64dc24fdfbe7d45ae4824a118f5bb86
MD5 2c07d31c6d93e3a8c08b95bd848509f0
BLAKE2b-256 0295ea29f3f1c3de69c55eb51b68d5f9dbc6472b7b554d5f4943edc5215907ee

See more details on using hashes here.

File details

Details for the file prismlib-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: prismlib-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 93.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for prismlib-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 23937922a03c05544ba2742b6551a3e77bc33b48f1ce56a456e16aae6cfca05b
MD5 79ac2df33970062897589649393a5fd2
BLAKE2b-256 8b912a9c4a652985b1c4a2554f23b2337ffc98807ab50a641fb75bf174c276d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page