Semantic caching for LLM apps — stop paying for the same answer twice

These details have not been verified by PyPI

Project links

Project description

◈ Sulci

Semantic caching for LLM apps — stop paying for the same answer twice.

"How do I cancel?" and "cancellation process?" are the same question. Sulci finds meaning-matches, not just string-matches — eliminating redundant LLM calls and cutting API costs by 40–85%.

Install

pip install "sulci[chroma]"    # ChromaDB  — recommended for getting started
pip install "sulci[sqlite]"    # SQLite    — zero infrastructure
pip install "sulci[qdrant]"    # Qdrant    — best production performance
pip install "sulci[faiss]"     # FAISS     — fastest local search
pip install "sulci[redis]"     # Redis     — sub-millisecond latency
pip install "sulci[milvus]"    # Milvus    — enterprise scale
pip install "sulci[all]"       # all backends

Quickstart

from sulci import Cache
import anthropic

cache  = Cache(backend="chroma", threshold=0.85)
client = anthropic.Anthropic()

def call_claude(query: str) -> str:
    msg = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": query}],
    )
    return msg.content[0].text

# First call — hits the Claude API (~1.8s)
r1 = cache.cached_call("What is semantic caching?", call_claude)
print(f"[{r1['source']}] {r1['latency_ms']:.0f}ms")     # [llm] 1843ms

# Paraphrase — served from cache (~0.5ms, no API call)
r2 = cache.cached_call("Explain how semantic caches work", call_claude)
print(f"[{r2['source']}] {r2['similarity']:.0%} match")  # [cache] 91% match

print(cache.stats())
# {'hits': 1, 'misses': 1, 'hit_rate': 0.5, 'saved_cost': 0.005, 'total_queries': 2}

No API key needed to try it — use the SQLite backend with a mock function:

pip install "sulci[sqlite]"
python examples/basic_usage.py

Benchmark results

10,000-query benchmark across 5 domains (5,000 warmup + 5,000 measured):

Domain	Hit Rate	p50 Latency
Customer Support	85.2%	0.55ms
Developer Q&A	88.2%	0.55ms
Product FAQ	85.0%	0.55ms
Medical Info	81.5%	0.55ms
General Knowledge	84.4%	0.55ms
Overall	84.9%	0.55ms

Cache hit latency: 0.55ms p50 vs Claude API: ~1,800ms — a 3,000x speedup on hits. Estimated cost saving at scale: $21 per 10,000 queries at standard API pricing.

Backends

Backend	Extra	Latency	Best for
ChromaDB	`sulci[chroma]`	~4ms	Getting started, local dev
SQLite	`sulci[sqlite]`	5–50ms	Zero infra, edge, prototyping
FAISS	`sulci[faiss]`	<2ms	Fastest local, 100k+ entries
Qdrant	`sulci[qdrant]`	<5ms	Production scale
Redis	`sulci[redis]`	<1ms	Sub-millisecond, existing Redis
Milvus	`sulci[milvus]`	5–20ms	Enterprise, Zilliz Cloud

All backends share the same API — swap backend="chroma" for backend="sqlite" and nothing else changes.

Embedding models

Model	Key	Dim	Notes
all-MiniLM-L6-v2	`"minilm"`	384	Default. Fast, free, no API key
all-mpnet-base-v2	`"mpnet"`	768	Better quality, still free
BAAI/bge-base-en-v1.5	`"bge"`	768	Best open-source quality
OpenAI text-embedding-3-small	`"openai"`	1536	Highest quality, requires API key

API reference

from sulci import Cache

cache = Cache(
    backend         = "chroma",     # "chroma" | "sqlite" | "faiss" | "qdrant" | "redis" | "milvus"
    threshold       = 0.85,         # cosine similarity threshold (0.0–1.0)
    embedding_model = "minilm",     # "minilm" | "mpnet" | "bge" | "openai"
    ttl_seconds     = 86400,        # entry TTL in seconds. None = no expiry
    personalized    = False,        # True = scope cache entries per user_id
    db_path         = "./sulci_db", # local storage path (Chroma, SQLite, FAISS)
)

# ── Drop-in LLM wrapper ───────────────────────────────────────
result = cache.cached_call(
    query,
    llm_fn,                 # any callable: (query, **kwargs) → str
    user_id      = None,    # optional: for personalized caching
    cost_per_call= 0.005,   # for savings tracking
    **llm_kwargs,           # forwarded to llm_fn on cache miss
)
# returns: {"response", "source", "similarity", "latency_ms", "cache_hit"}

# ── Manual control ────────────────────────────────────────────
response, similarity = cache.get(query, user_id=None)
cache.set(query, response, user_id=None, metadata=None)

# ── Session stats ─────────────────────────────────────────────
cache.stats()   # {"hits", "misses", "hit_rate", "saved_cost", "total_queries"}
cache.clear()   # remove all entries and reset stats

Run tests

pip install "sulci[sqlite]" pytest
pytest tests/ -v

# Run only core tests (fastest)
pytest tests/test_core.py -v

# Run backend-specific tests (skips backends whose deps aren't installed)
pytest tests/test_backends.py -v

Project structure

sulci/
├── sulci/
│   ├── __init__.py             ← exports Cache
│   ├── core.py                 ← Cache engine
│   ├── backends/
│   │   ├── chroma.py
│   │   ├── qdrant.py
│   │   ├── faiss.py
│   │   ├── redis.py
│   │   ├── sqlite.py
│   │   └── milvus.py
│   └── embeddings/
│       ├── minilm.py           ← local (free, default)
│       └── openai.py           ← OpenAI API
├── tests/
│   ├── test_core.py            ← 20 tests: ops, stats, threshold, personalization
│   └── test_backends.py        ← per-backend contract tests
├── examples/
│   ├── basic_usage.py          ← runs with no API key
│   └── anthropic_example.py    ← full Claude integration
├── .github/workflows/
│   ├── tests.yml               ← CI on every push/PR
│   └── publish.yml             ← auto-publish to PyPI on git tag
├── pyproject.toml
├── setup.py
├── CHANGELOG.md
├── CONTRIBUTING.md
└── LICENSE

Contributing

See CONTRIBUTING.md for dev setup, how to add a new backend, and the release process.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.4

May 4, 2026

0.5.3

May 4, 2026

0.5.2

May 3, 2026

0.5.1

Apr 28, 2026

0.5.0

Apr 28, 2026

0.4.0

Apr 27, 2026

0.3.7

Apr 12, 2026

0.3.6

Apr 10, 2026

0.3.5

Apr 10, 2026

0.3.4

Apr 9, 2026

0.3.3

Apr 8, 2026

0.3.2

Mar 28, 2026

0.3.1

Mar 27, 2026

0.3.0

Mar 25, 2026

0.2.5

Mar 17, 2026

0.2.4

Mar 17, 2026

0.2.3

Mar 17, 2026

0.2.2

Mar 15, 2026

0.2.1

Mar 11, 2026

0.2.0

Mar 11, 2026

This version

0.1.1

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sulci-0.1.1.tar.gz (20.9 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sulci-0.1.1-py3-none-any.whl (19.1 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file sulci-0.1.1.tar.gz.

File metadata

Download URL: sulci-0.1.1.tar.gz
Upload date: Mar 9, 2026
Size: 20.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for sulci-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`7473b0982e6cb5295c19c859348647f47ad2b84cbf9bf5682141f10fc50d648f`
MD5	`7b401231a561a6c179a22226dc593501`
BLAKE2b-256	`4880068fadfa2b43951aec50e3033ee3806fb896141c91d9b9034c05bd6b36d0`

See more details on using hashes here.

File details

Details for the file sulci-0.1.1-py3-none-any.whl.

File metadata

Download URL: sulci-0.1.1-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 19.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for sulci-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5cad1693244ae48f7d330433ab8aefba92380f51a534b691aec39c04f5aeca0f`
MD5	`fdc5a9916cc24d4f88bd187d5e0a2368`
BLAKE2b-256	`7d26ccbfc13b9749b7b0d16d8ddce0ef017b9a7370f556bc4e5e62c8233c3796`

See more details on using hashes here.

sulci 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

◈ Sulci

Install

Quickstart

Benchmark results

Backends

Embedding models

API reference

Run tests

Project structure

Contributing

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes