Skip to main content

Semantic caching for LLM apps — stop paying for the same answer twice

Project description

◈ Sulci

Semantic caching for LLM apps — stop paying for the same answer twice.

PyPI version Python 3.9+ License: MIT Tests

"How do I cancel?" and "cancellation process?" are the same question. Sulci finds meaning-matches, not just string-matches — eliminating redundant LLM calls and cutting API costs by 40–85%.


Install

pip install "sulci[chroma]"    # ChromaDB  — recommended for getting started
pip install "sulci[sqlite]"    # SQLite    — zero infrastructure
pip install "sulci[qdrant]"    # Qdrant    — best production performance
pip install "sulci[faiss]"     # FAISS     — fastest local search
pip install "sulci[redis]"     # Redis     — sub-millisecond latency
pip install "sulci[milvus]"    # Milvus    — enterprise scale
pip install "sulci[all]"       # all backends

Quickstart

from sulci import Cache
import anthropic

cache  = Cache(backend="chroma", threshold=0.85)
client = anthropic.Anthropic()

def call_claude(query: str) -> str:
    msg = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": query}],
    )
    return msg.content[0].text

# First call — hits the Claude API (~1.8s)
r1 = cache.cached_call("What is semantic caching?", call_claude)
print(f"[{r1['source']}] {r1['latency_ms']:.0f}ms")     # [llm] 1843ms

# Paraphrase — served from cache (~0.5ms, no API call)
r2 = cache.cached_call("Explain how semantic caches work", call_claude)
print(f"[{r2['source']}] {r2['similarity']:.0%} match")  # [cache] 91% match

print(cache.stats())
# {'hits': 1, 'misses': 1, 'hit_rate': 0.5, 'saved_cost': 0.005, 'total_queries': 2}

No API key needed to try it — use the SQLite backend with a mock function:

pip install "sulci[sqlite]"
python examples/basic_usage.py

Benchmark results

10,000-query benchmark across 5 domains (5,000 warmup + 5,000 measured):

Domain Hit Rate p50 Latency
Customer Support 85.2% 0.55ms
Developer Q&A 88.2% 0.55ms
Product FAQ 85.0% 0.55ms
Medical Info 81.5% 0.55ms
General Knowledge 84.4% 0.55ms
Overall 84.9% 0.55ms

Cache hit latency: 0.55ms p50 vs Claude API: ~1,800ms — a 3,000x speedup on hits. Estimated cost saving at scale: $21 per 10,000 queries at standard API pricing.


Backends

Backend Extra Latency Best for
ChromaDB sulci[chroma] ~4ms Getting started, local dev
SQLite sulci[sqlite] 5–50ms Zero infra, edge, prototyping
FAISS sulci[faiss] <2ms Fastest local, 100k+ entries
Qdrant sulci[qdrant] <5ms Production scale
Redis sulci[redis] <1ms Sub-millisecond, existing Redis
Milvus sulci[milvus] 5–20ms Enterprise, Zilliz Cloud

All backends share the same API — swap backend="chroma" for backend="sqlite" and nothing else changes.


Embedding models

Model Key Dim Notes
all-MiniLM-L6-v2 "minilm" 384 Default. Fast, free, no API key
all-mpnet-base-v2 "mpnet" 768 Better quality, still free
BAAI/bge-base-en-v1.5 "bge" 768 Best open-source quality
OpenAI text-embedding-3-small "openai" 1536 Highest quality, requires API key

API reference

from sulci import Cache

cache = Cache(
    backend         = "chroma",     # "chroma" | "sqlite" | "faiss" | "qdrant" | "redis" | "milvus"
    threshold       = 0.85,         # cosine similarity threshold (0.0–1.0)
    embedding_model = "minilm",     # "minilm" | "mpnet" | "bge" | "openai"
    ttl_seconds     = 86400,        # entry TTL in seconds. None = no expiry
    personalized    = False,        # True = scope cache entries per user_id
    db_path         = "./sulci_db", # local storage path (Chroma, SQLite, FAISS)
)

# ── Drop-in LLM wrapper ───────────────────────────────────────
result = cache.cached_call(
    query,
    llm_fn,                 # any callable: (query, **kwargs) → str
    user_id      = None,    # optional: for personalized caching
    cost_per_call= 0.005,   # for savings tracking
    **llm_kwargs,           # forwarded to llm_fn on cache miss
)
# returns: {"response", "source", "similarity", "latency_ms", "cache_hit"}

# ── Manual control ────────────────────────────────────────────
response, similarity = cache.get(query, user_id=None)
cache.set(query, response, user_id=None, metadata=None)

# ── Session stats ─────────────────────────────────────────────
cache.stats()   # {"hits", "misses", "hit_rate", "saved_cost", "total_queries"}
cache.clear()   # remove all entries and reset stats

Run tests

pip install "sulci[sqlite]" pytest
pytest tests/ -v

# Run only core tests (fastest)
pytest tests/test_core.py -v

# Run backend-specific tests (skips backends whose deps aren't installed)
pytest tests/test_backends.py -v

Project structure

sulci/
├── sulci/
│   ├── __init__.py             ← exports Cache
│   ├── core.py                 ← Cache engine
│   ├── backends/
│   │   ├── chroma.py
│   │   ├── qdrant.py
│   │   ├── faiss.py
│   │   ├── redis.py
│   │   ├── sqlite.py
│   │   └── milvus.py
│   └── embeddings/
│       ├── minilm.py           ← local (free, default)
│       └── openai.py           ← OpenAI API
├── tests/
│   ├── test_core.py            ← 20 tests: ops, stats, threshold, personalization
│   └── test_backends.py        ← per-backend contract tests
├── examples/
│   ├── basic_usage.py          ← runs with no API key
│   └── anthropic_example.py    ← full Claude integration
├── .github/workflows/
│   ├── tests.yml               ← CI on every push/PR
│   └── publish.yml             ← auto-publish to PyPI on git tag
├── pyproject.toml
├── setup.py
├── CHANGELOG.md
├── CONTRIBUTING.md
└── LICENSE

Contributing

See CONTRIBUTING.md for dev setup, how to add a new backend, and the release process.


Links

MIT License © 2025 Sulci

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sulci-0.1.1.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sulci-0.1.1-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file sulci-0.1.1.tar.gz.

File metadata

  • Download URL: sulci-0.1.1.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for sulci-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7473b0982e6cb5295c19c859348647f47ad2b84cbf9bf5682141f10fc50d648f
MD5 7b401231a561a6c179a22226dc593501
BLAKE2b-256 4880068fadfa2b43951aec50e3033ee3806fb896141c91d9b9034c05bd6b36d0

See more details on using hashes here.

File details

Details for the file sulci-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sulci-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for sulci-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5cad1693244ae48f7d330433ab8aefba92380f51a534b691aec39c04f5aeca0f
MD5 fdc5a9916cc24d4f88bd187d5e0a2368
BLAKE2b-256 7d26ccbfc13b9749b7b0d16d8ddce0ef017b9a7370f556bc4e5e62c8233c3796

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page