The AI native context-aware semantic cache for LLM apps — Patent Pending - stop paying for the same answer twice

These details have not been verified by PyPI

Project links

Project description

Sulci Cache

The AI native context-aware semantic caching for LLM apps — stop paying for the same answer twice

Sulci Cache is a drop-in Python library that caches LLM responses by semantic meaning, not exact string match. When a user asks "How do I deploy to AWS?" and someone else later asks "What's the process for deploying on AWS?", Sulci Cache returns the cached answer instead of calling the LLM again — saving cost and latency.

Why Sulci Cache

Without Sulci Cache	With Sulci Cache
Every query hits the LLM API	Semantically similar queries return instantly from cache
$0.005 per call, every time	Cache hits cost ~$0.0001 (embedding only)
1–3 second response time	Cache hits return in <10ms
No memory across sessions	Context-aware: understands conversation history

Benchmark results (v0.5.0, 5,000 queries):

Overall hit rate: 85.9%
Hit latency p50: 0.74ms (vs ~1,840ms for a live LLM call)
Cost saved per 10k queries: $21.47
Context-aware mode: +20.8pp resolution accuracy over stateless

Install

Step 1 — Install Sulci Cache with a backend:

pip install "sulci[sqlite]"    # SQLite — zero infra, local dev (start here)
pip install "sulci[chroma]"    # ChromaDB
pip install "sulci[faiss]"     # FAISS
pip install "sulci[qdrant]"    # Qdrant
pip install "sulci[redis]"     # Redis + RedisVL
pip install "sulci[milvus]"    # Milvus Lite
pip install "sulci[cloud]"     # Sulci Cloud managed backend

LangChain integration:

pip install "sulci[sqlite,langchain]"   # + LangChain integration

LlamaIndex integration:

pip install "sulci[sqlite,llamaindex]"  # + LlamaIndex native integration

AsyncCache (non-blocking async wrapper):

pip install "sulci[sqlite]"   # AsyncCache is included — no extra install needed

Step 2 — Install your LLM SDK (required for cached_call with a live model):

pip install anthropic           # for Anthropic / Claude
pip install openai              # for OpenAI

zsh users: always wrap extras in quotes — "sulci[sqlite]" not sulci[sqlite].

LangChain Integration

Sulci Cache is the only LangChain cache that implements context-aware lookup vector blending — blending prior conversation turns into the similarity lookup, not just matching the current prompt in isolation.

from langchain_core.globals import set_llm_cache
from sulci.integrations.langchain import SulciCache

# Stateless semantic — drop-in for GPTCache
set_llm_cache(SulciCache(backend="sqlite"))

# Context-aware — chatbot / agent (+56pp hit rate in customer support)
set_llm_cache(SulciCache(backend="sqlite", context_window=4, threshold=0.75))

# Managed Sulci Cloud
set_llm_cache(SulciCache(backend="sulci", api_key="sk-sulci-..."))

Install: pip install "sulci[sqlite,langchain]"

LlamaIndex Integration

SulciCacheLLM is a native LLM-level semantic cache for LlamaIndex with no LangChain dependency required. It wraps any LlamaIndex-compatible LLM (OpenAI, Anthropic, Ollama, HuggingFaceLLM, etc.) — complete() and chat() are cached, streaming passes through uncached, async methods use run_in_executor.

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from sulci.integrations.llamaindex import SulciCacheLLM

# Stateless — drop-in for any LlamaIndex LLM
Settings.llm = SulciCacheLLM(
    llm       = OpenAI(model="gpt-4o"),
    backend   = "sqlite",
    threshold = 0.85,
)

# Context-aware — RAG chatbot / agent (+56pp hit rate in customer support)
Settings.llm = SulciCacheLLM(
    llm            = OpenAI(model="gpt-4o"),
    backend        = "sqlite",
    threshold      = 0.75,
    context_window = 4,
)

# Managed Sulci Cloud
Settings.llm = SulciCacheLLM(
    llm     = OpenAI(model="gpt-4o"),
    backend = "sulci",
    api_key = "sk-sulci-...",
)

Install: pip install "sulci[sqlite,llamaindex]"

Via LangChain (alternative — works today, no extra install):

from langchain_core.globals import set_llm_cache
from sulci.integrations.langchain import SulciCache
from llama_index.llms.langchain import LangChainLLM
from langchain_openai import ChatOpenAI

set_llm_cache(SulciCache(backend="sqlite", context_window=4))

from llama_index.core import Settings
Settings.llm = LangChainLLM(llm=ChatOpenAI(model="gpt-4o"))

Install: pip install "sulci[sqlite,langchain]" llama-index-llms-langchain langchain-openai

AsyncCache — non-blocking async wrapper

AsyncCache wraps sulci.Cache with asyncio.to_thread() so every cache operation yields the event loop. The correct pattern for FastAPI, LangChain async chains, LlamaIndex async agents, and any asyncio-based application.

from sulci import AsyncCache

cache = AsyncCache(backend="sqlite", context_window=4)

# FastAPI endpoint — event loop never blocked
@app.post("/chat")
async def chat(query: str, session_id: str):
    response, sim, depth = await cache.aget(query, session_id=session_id)
    if response:
        return {"response": response, "source": "cache", "sim": sim}
    response = await call_llm(query)
    await cache.aset(query, response, session_id=session_id)
    return {"response": response, "source": "llm"}

# All Cache parameters work identically
cache = AsyncCache(
    backend        = "sqlite",
    threshold      = 0.85,
    context_window = 4,
    query_weight   = 0.70,
    api_key        = "sk-sulci-...",   # for Sulci Cloud
)

Async methods: aget(), aset(), acached_call(), aget_context(), aclear_context(), acontext_summary(), astats(), aclear()

Sync passthrough: All sync methods (get, set, stats, clear) also available — AsyncCache works in mixed sync/async codebases without switching types.

Sulci Cloud — zero infrastructure option

Get a free API key at sulci.io/signup and switch to the managed backend with a single parameter change. Everything else stays identical.

# Before — self-hosted (works today)
cache = Cache(backend="sqlite", threshold=0.85)

# After — managed cloud (zero other code changes)
cache = Cache(backend="sulci", api_key="sk-sulci-...", threshold=0.85)

# Or via environment variable — zero code changes at all
# export SULCI_API_KEY=sk-sulci-...
cache = Cache(backend="sulci", threshold=0.85)

Free tier: 50,000 requests/month. No credit card required.

sulci.connect()

For apps that want to set the key once at startup and enable optional telemetry:

import sulci

sulci.connect(
    api_key   = "sk-sulci-...",   # or set SULCI_API_KEY env var
    telemetry = True,             # default True — set False to disable reporting
)

cache = Cache(backend="sulci")    # picks up key from connect() automatically

Telemetry is strictly opt-in. Nothing is sent unless sulci.connect() is called. _telemetry_enabled = False until you explicitly connect. Disable per-instance with Cache(backend="sulci", telemetry=False).

Key resolution order (first match wins):

1. Explicit api_key= argument to sulci.connect() or Cache()
2. SULCI_API_KEY environment variable
3. ~/.sulci/config (persisted by a prior successful sulci.connect() call)
4. Browser-based OSS-Connect device-code flow — only if prompt=True

Step 3 (config persistence) ships in v0.5.3. After your first successful sulci.connect(api_key="sk-sulci-..."), the key is persisted to ~/.sulci/config (mode 0600) and subsequent sulci.connect() calls with no arguments will pick it up automatically.

Step 4 (device-code flow) ships latent in v0.5.3. The SDK code is in place, but the gateway endpoints and dashboard page need to deploy end-to-end before it's usable. The prompt parameter defaults to False in v0.5.3:

# v0.5.3 default — safe everywhere:
sulci.connect()
# - Step 1-3 work normally
# - Step 4 is skipped (prompt=False default)
# - If no key found, connect() returns silently (no telemetry enabled)

# Once your environment has OSS-Connect end-to-end deployed
# (gateway + dashboard), opt in:
sulci.connect(prompt=True)
# - First-run: prints "Visit https://app.sulci.io/oss-connect and enter code: WXYZ-2345"
# - User authorizes via browser → SDK gets api_key and persists to ~/.sulci/config
# - Subsequent runs: step 3 short-circuits, no browser needed

v0.6.0 will flip the prompt default to True once the full chain is shipped. Setting prompt=True against an environment that hasn't announced OSS-Connect availability is user error — wait for the release announcement.

Quickstart

Stateless (v0.1 style)

from sulci import Cache

cache = Cache(backend="sqlite", threshold=0.85)

# store a response
cache.set("How do I deploy to AWS?", "Use the AWS CLI with 'aws deploy'...")

# exact or semantic hit — returns 3-tuple
response, similarity, context_depth = cache.get("What's the process for deploying on AWS?")

if response:
    print(f"Cache hit (sim={similarity:.2f}): {response}")
else:
    # call your LLM here
    pass

Context-aware (v0.2 style)

from sulci import Cache

cache = Cache(
    backend        = "sqlite",
    threshold      = 0.85,
    context_window = 4,     # remember last 4 turns
    query_weight   = 0.70,  # α — weight of current query vs context
    context_decay  = 0.50,  # halve weight per older turn
)

# turn 1
cache.set("What is Python?", "Python is a high-level programming language.", session_id="s1")

# turn 2 — context from turn 1 blended into the lookup vector
response, sim, depth = cache.get("Tell me more about it", session_id="s1")

Drop-in with `cached_call`

Requires: pip install "sulci[sqlite]" anthropic
export ANTHROPIC_API_KEY=sk-ant-...

import anthropic
from sulci import Cache

cache = Cache(backend="chroma", threshold=0.85)
client = anthropic.Anthropic()

def call_llm(prompt: str) -> str:
    msg = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return msg.content[0].text

result = cache.cached_call(
    query  = "How do I deploy to AWS?",
    llm_fn = call_llm,
)

print(result["response"])
print(f"Source:  {result['source']}")       # "cache" or "llm"
print(f"Latency: {result['latency_ms']:.1f}ms")

Run it a second time with the same (or similar) question — source switches to "cache" and latency drops from ~2,000ms to under 10ms.

API Reference

Constructor

cache = Cache(
    backend         = "sqlite",   # sqlite | chroma | faiss | qdrant | redis | milvus | sulci
    threshold       = 0.85,       # cosine similarity cutoff (0–1)
    embedding_model = "minilm",   # minilm | openai
    ttl_seconds     = None,       # None = no expiry
    personalized    = False,      # partition cache per user_id
    db_path         = "./sulci",  # on-disk path for sqlite / faiss
    context_window  = 0,          # turns to remember; 0 = stateless
    query_weight    = 0.70,       # α in blending formula
    context_decay   = 0.50,       # per-turn decay weight
    session_ttl     = 3600,       # session expiry in seconds
    api_key         = None,       # required when backend="sulci"
    telemetry       = True,       # set False to disable per-instance
)

Methods

Method	Returns	Description
`cache.get(query, *, tenant_id=None, user_id=None, session_id=None)`	`(str\|None, float, int)`	response, similarity, context_depth (tenant_id added in v0.4.0)
`cache.set(query, response, *, tenant_id=None, user_id=None, session_id=None, metadata=None)`	`None`	Store entry, advance context window
`cache.cached_call(query, llm_fn, *, tenant_id=None, user_id=None, session_id=None, cost_per_call=0.005)`	`dict`	response, source, similarity, latency_ms, cache_hit, context_depth
`cache.get_context(session_id)`	`ContextWindow`	Return session's context window
`cache.clear_context(session_id)`	`None`	Reset session history
`cache.context_summary(session_id=None)`	`dict`	Snapshot of one or all sessions
`cache.stats()`	`dict`	hits, misses, hit_rate, saved_cost, total_queries, active_sessions
`cache.clear()`	`None`	Evict all entries, reset stats and sessions

Important: cache.get() returns a 3-tuple (response, similarity, context_depth) — not a 2-tuple like v0.1. Always unpack all three values.

v0.5.0 additions

Two additive constructor kwargs for advanced deployments:

from sulci import Cache, RedisSessionStore, TelemetrySink

cache = Cache(
    backend        = "sqlite",
    context_window = 4,
    session_store  = RedisSessionStore(redis_client),         # horizontal-scale sessions
    event_sink     = TelemetrySink("https://your.endpoint"),  # privacy-firewalled events
)

session_store= accepts any sulci.sessions.SessionStore impl. Default None uses the legacy in-process manager (unchanged from v0.4.x).
event_sink= accepts any sulci.sinks.EventSink impl. Default None uses NullSink() (no-op). Shipped sinks (TelemetrySink, RedisStreamSink) enforce a strict field allowlist — query text, response text, and embeddings never leave the process.
SyncCache is now exported as a naming-symmetric alias for Cache (parallel to AsyncCache).

v0.5.2 additions

Connected-OSS telemetry — opt-in, anonymous, never enabled by default. Pairs with the Sulci dashboard at sulci.io to give you persistent stats, multi-machine fingerprinting, and savings reports without sending any query content.

import sulci

# Opt in — sends aggregate counts + a stable per-deployment fingerprint
sulci.connect(api_key="sk-sulci-...")

cache = sulci.Cache(backend="sqlite", threshold=0.85)
# cache.get() / cache.set() now buffer aggregated metrics for /v1/telemetry

What's new at the wire level:

Per-deployment fingerprint — blake2b(machine_id || backend || embedding_model || threshold || context_window) truncated to 24 hex chars. The machine_id is a fresh uuid4 generated once and persisted at ~/.sulci/config (mode 0600). No PII, no MAC, no hostname. Switching backends produces a new fingerprint, which the dashboard treats as a new deployment.
cache.set events are now buffered and POSTed alongside cache.get events — gives the dashboard a write-vs-read picture of your cache.
Privacy firewall is unchanged. Wire payload is locked to nine allowlisted fields (event, backend, hits, misses, avg_latency_ms, sdk_version, python_version, fingerprint). The gateway uses extra='forbid' server-side as a hard rejection.

New top-level modules:

sulci.config — small persistent SDK config at ~/.sulci/config. load(), save(), update(), get_machine_id(). Atomic write, silent fallback on corruption.
sulci.telemetry — fingerprint helper + wire-field allowlist for the connect() emit pipe. Distinct from sulci.sinks.telemetry (the per-event EventSink implementation from v0.5.0).

Passive nudge:

After 100 cached queries on a Cache instance that hasn't been connected, cache.stats() prints a one-line nudge to stderr suggesting sulci.connect(). One-shot per process. Silence with SULCI_QUIET=1 or by calling sulci.connect().

export SULCI_QUIET=1   # silences the nudge globally

v0.5.3 additions

OSS-Connect device-code SDK client (D12). The flow ships latent — SDK code is in place, but prompt defaults to False because the gateway endpoints and dashboard page need to deploy end-to-end before the flow is usable. v0.6.0 will flip prompt to True once the full chain ships. Setting prompt=True against an environment that hasn't announced OSS-Connect availability is user error.

import sulci

# v0.5.3 default — completely safe:
sulci.connect(api_key="sk-sulci-...")     # the v0.5.x flow, unchanged
sulci.connect()                            # falls through args/env/config; no browser

# After the Sulci team announces OSS-Connect availability (v0.6.0):
sulci.connect(prompt=True)                 # browser-based onboarding

What's new at the SDK level:

sulci.oss_connect — RFC 8628 device-code flow client. Lazy-imported only on the no-key-found path so import sulci cost is unchanged for users who never trigger it.
Four-step sulci.connect() resolution — arg → env → ~/.sulci/config → device-code flow. The third step (config-persisted key) is new in v0.5.3 — your first successful sulci.connect(api_key=...) persists the key, and subsequent sulci.connect() calls with no arguments pick it up automatically.
prompt: bool = False — keyword parameter. Default flips to True in v0.6.0.
SULCI_GATEWAY env var — overrides the gateway base URL (default https://api.sulci.io). Used for staging / local-dev. Same value drives both telemetry and the new device-code client.

Context-Aware Blending

When context_window > 0, Sulci Cache blends the current query vector with recent conversation history before performing the similarity lookup:

lookup_vec = α · embed(query) + (1−α) · Σ(decay^i · turn_i)

α = query_weight (default 0.70) — how much the current query dominates
decay = context_decay (default 0.50) — halves weight per older turn
Only user query vectors are stored in context (not LLM responses)
Raw un-blended vectors stored in cache; blending happens at lookup time only

Context-aware benchmark results (800 conversation pairs, context_window=4):

Domain	Stateless	Context-aware	Δ
customer_support	32%	88%	+56pp
developer_qa	80%	96%	+16pp
medical_information	40%	60%	+20pp
overall	64.0%	81.6%	+17.6pp

Backends

Backend	ID	Hit latency	Best for
SQLite	`sqlite`	<8ms	Local dev, edge, serverless, zero infra
ChromaDB	`chroma`	<10ms	Fastest path to working, Python-native
FAISS	`faiss`	<3ms	GPU acceleration, massive scale
Qdrant	`qdrant`	<5ms	Production, metadata filtering
Redis + RedisVL	`redis`	<1ms	Existing Redis infra, lowest latency
Milvus Lite	`milvus`	<7ms	Dev-to-prod without code changes
Sulci Cloud	`sulci`	<8ms	Zero infra — managed service

All self-hosted backends are free tier or self-hostable at zero cost.

Embedding Models

ID	Model	Dims	Latency	Notes
`minilm`	all-MiniLM-L6-v2	384	14ms	Default — free, local, excellent quality
`openai`	text-embedding-3-small	1536	~100ms	Requires `OPENAI_API_KEY`

The default minilm model runs entirely locally via sentence-transformers. No network calls are made unless you explicitly configure embedding_model="openai".

Project Structure

.
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── LOCAL_SETUP.md
├── Makefile                    ← make smoke, make test, make test-all, make verify
├── NOTICE
├── README.md
├── benchmark
│   ├── README.md               ← benchmark methodology and results
│   └── run.py                  ← benchmark CLI (--context for context-aware pass)
├── examples
│   ├── anthropic_example.py    ← Anthropic Claude, context-aware, requires ANTHROPIC_API_KEY
│   ├── basic_usage.py          ← stateless cache demo, no API key needed
│   ├── context_aware.py        ← 4-demo walkthrough, fully offline
│   ├── context_aware_example.py← additional context-aware patterns
│   ├── langchain_example.py    ← LangChain integration, OpenAI/Anthropic/mock
│   ├── llamaindex_example.py   ← LlamaIndex integration, OpenAI/Anthropic/mock
│   └── async_example.py        ← AsyncCache demo, OpenAI/Anthropic/mock    (v0.3.7)
├── pyproject.toml              ← name="sulci", version="0.5.2"
├── setup.py
├── setup.sh                    ← one-shot setup: venv + install + smoke tests
├── smoke_test.py               ← core smoke test
├── smoke_test_langchain.py     ← LangChain integration smoke test
├── smoke_test_llamaindex.py    ← LlamaIndex integration smoke test
├── smoke_test_async.py         ← AsyncCache smoke test                     (v0.3.7)
├── sulci
│   ├── __init__.py             ← exports Cache, SyncCache, AsyncCache, ContextWindow,
│   │                              SessionStore (legacy), InMemorySessionStore,
│   │                              RedisSessionStore, EventSink, NullSink,
│   │                              TelemetrySink, RedisStreamSink, CacheEvent, connect()
│   │                              _SDK_VERSION = __version__   # derived from pyproject.toml
│   ├── backends
│   │   ├── __init__.py         ← empty — core.py loads backends via importlib
│   │   ├── chroma.py
│   │   ├── cloud.py            ← SulciCloudBackend (backend="sulci")
│   │   ├── faiss.py
│   │   ├── milvus.py
│   │   ├── qdrant.py
│   │   ├── redis.py
│   │   └── sqlite.py
│   ├── async_cache.py          ← AsyncCache non-blocking wrapper         (v0.3.7)
│   ├── context.py              ← ContextWindow + legacy SessionStore manager
│   ├── core.py                 ← Cache engine + B1 adapter (v0.5.0)
│   │                              telemetry= param, api_key= param,
│   │                              session_store= + event_sink= kwargs (v0.5.0)
│   ├── embeddings
│   │   ├── __init__.py
│   │   ├── minilm.py           ← default: all-MiniLM-L6-v2 (free, local)
│   │   └── openai.py           ← requires OPENAI_API_KEY
│   ├── sessions                ← v0.5.0 — SessionStore protocol package
│   │   ├── __init__.py
│   │   ├── protocol.py         ← public stable SessionStore protocol
│   │   ├── memory.py           ← InMemorySessionStore (default)
│   │   └── redis.py            ← RedisSessionStore (multi-replica)
│   ├── sinks                   ← v0.5.0 — EventSink protocol package
│   │   ├── __init__.py
│   │   ├── protocol.py         ← public stable EventSink + CacheEvent
│   │   ├── null.py             ← NullSink (default no-op)
│   │   ├── telemetry.py        ← TelemetrySink (HTTPS POST, allowlist-scrubbed)
│   │   └── redis_stream.py     ← RedisStreamSink (XADD, allowlist-scrubbed)
│   └── integrations
│       ├── __init__.py
│       ├── langchain.py        ← SulciCache(BaseCache) for LangChain  (v0.3.3)
│       └── llamaindex.py       ← SulciCacheLLM(LLM) for LlamaIndex    (v0.3.5)
└── tests
    ├── test_backends.py                —   9 tests: per-backend contract + persistence
    ├── test_cloud_backend.py           —  28 tests: SulciCloudBackend + Cache wiring
    ├── test_connect.py                 —  32 tests: sulci.connect(), _emit(), _flush()
    ├── test_context.py                 —  35 tests: ContextWindow, legacy SessionStore
    ├── test_core.py                    —  31 tests: cache.get/set, TTL, stats, personalization, tenant_id
    ├── test_integrations_langchain.py  —  27 tests: SulciCache LangChain adapter
    ├── test_integrations_llamaindex.py —  29 tests: SulciCacheLLM LlamaIndex wrapper
    ├── test_async_cache.py             —  25 tests: AsyncCache non-blocking wrapper       (v0.3.7)
    ├── test_qdrant_tenant_isolation.py —  11 tests: tenant_id partition isolation         (v0.4.0)
    ├── test_sessions.py                —  24 tests: SessionStore protocol + tenant isol.  (v0.5.0)
    ├── test_sinks.py                   —  15 tests: EventSink protocol + privacy allowlist (v0.5.0)
    ├── test_session_store_injection.py —  12 tests: Cache(session_store=, event_sink=)    (v0.5.0)
    ├── test_config.py                  —  20 tests: ~/.sulci/config — load/save/0600 perms (v0.5.2)
    ├── test_telemetry.py               —  24 tests: fingerprint helper + flush wire shape  (v0.5.2)
    ├── test_nudge.py                   —  13 tests: 100-query nudge in Cache.stats()       (v0.5.2)
    └── compat/                         —  Backend + Embedder conformance suites

Plus: sulci/tests/compat/ — SessionStore + EventSink conformance suites (v0.5.0)

Running Tests

# full suite — 212 tests total (7 skipped if optional backend deps not installed)
python -m pytest tests/ -v

# by file
python -m pytest tests/test_core.py -v                       # 27 tests
python -m pytest tests/test_context.py -v                    # 35 tests
python -m pytest tests/test_backends.py -v                   #  9 tests (skipped if dep missing)
python -m pytest tests/test_connect.py -v                    # 32 tests — sulci.connect() + telemetry
python -m pytest tests/test_cloud_backend.py -v              # 28 tests — SulciCloudBackend
python -m pytest tests/test_integrations_langchain.py -v     # 27 tests — LangChain integration
python -m pytest tests/test_integrations_llamaindex.py -v    # 29 tests — LlamaIndex integration
python -m pytest tests/test_async_cache.py -v                # 25 tests — AsyncCache wrapper

# single backend only
python -m pytest tests/test_backends.py -v -k sqlite
python -m pytest tests/test_backends.py -v -k chroma

# with coverage
python -m pytest tests/ -v --cov=sulci --cov-report=term-missing

Make targets

make smoke              # all smoke tests (core + LangChain + LlamaIndex)
make smoke-core         # core smoke test only
make smoke-langchain    # LangChain smoke test only
make smoke-llamaindex   # LlamaIndex smoke test only
make smoke-async        # AsyncCache smoke test only
make test               # core pytest suite
make test-integrations  # LangChain + LlamaIndex integration tests
make test-async         # AsyncCache tests only
make test-all           # full suite (212 tests)
make test-cov           # full suite with coverage
make verify             # smoke + test-all (run before committing)

test_connect.py (32 tests) — sulci.connect(), _emit(), _flush(), Cache(telemetry=). Requires httpx.

test_cloud_backend.py (28 tests) — SulciCloudBackend construction, search(), upsert(), delete_user(), clear(), and Cache(backend='sulci') wiring. Requires httpx.

test_integrations_langchain.py (27 tests) — SulciCache(BaseCache) LangChain adapter. Requires langchain-core.

test_integrations_llamaindex.py (29 tests) — SulciCacheLLM(LLM) LlamaIndex wrapper. Requires llama-index-core.

Backend tests are skipped — not failed when their dependency isn't installed. Install the backend extra to run its tests: pip install -e ".[chroma]".

See LOCAL_SETUP.md for the full local development guide including venv setup, backend installation, smoke testing, and troubleshooting.

Examples

python examples/basic_usage.py          # stateless cache — no API key needed
python examples/context_aware.py        # context-aware — no API key needed
python examples/anthropic_example.py    # requires ANTHROPIC_API_KEY
python examples/langchain_example.py    # OpenAI or Anthropic or mock fallback
python examples/llamaindex_example.py   # OpenAI or Anthropic or mock fallback
python examples/async_example.py        # AsyncCache demo, OpenAI/Anthropic/mock

Benchmark

# fast run (~30 seconds)
python benchmark/run.py --no-sweep --queries 1000

# with context-aware pass
python benchmark/run.py --no-sweep --queries 1000 --context

# full benchmark
python benchmark/run.py --context

See benchmark/README.md for full methodology and results.

Troubleshooting

`ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`

Conda environments often have a stale huggingface_hub that conflicts with sentence-transformers. Fix by upgrading all three together:

pip install --upgrade huggingface_hub datasets sentence-transformers

Or use a clean venv (avoids conda transitive dependency conflicts entirely):

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install "sulci[sqlite]" anthropic
python your_script.py

`huggingface/tokenizers: The current process just got forked...` warning

Harmless — suppress it with:

export TOKENIZERS_PARALLELISM=false

`anthropic.OverloadedError: Error code: 529`

Transient API congestion — not a Sulci Cache issue. Wait a moment and retry, or check status.anthropic.com.

`zsh: no matches found: sulci[chroma]`

Wrap extras in quotes:

pip install "sulci[chroma]"    # ✓
pip install sulci[chroma]      # ✗ — zsh glob expansion breaks this

`pytest: command not found`

python -m pytest tests/ -v

Contributing

See CONTRIBUTING.md for branching model, PR process, and coding standards.

License

Apache License 2.0 — see LICENSE.

U.S. Patent Application No. 64/018,452 (pending) covers the context-aware semantic caching algorithm. Apache 2.0 grants users a royalty-free patent license for use of this code.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.4

May 4, 2026

This version

0.5.3

May 4, 2026

0.5.2

May 3, 2026

0.5.1

Apr 28, 2026

0.5.0

Apr 28, 2026

0.4.0

Apr 27, 2026

0.3.7

Apr 12, 2026

0.3.6

Apr 10, 2026

0.3.5

Apr 10, 2026

0.3.4

Apr 9, 2026

0.3.3

Apr 8, 2026

0.3.2

Mar 28, 2026

0.3.1

Mar 27, 2026

0.3.0

Mar 25, 2026

0.2.5

Mar 17, 2026

0.2.4

Mar 17, 2026

0.2.3

Mar 17, 2026

0.2.2

Mar 15, 2026

0.2.1

Mar 11, 2026

0.2.0

Mar 11, 2026

0.1.1

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sulci-0.5.3.tar.gz (128.1 kB view details)

Uploaded May 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sulci-0.5.3-py3-none-any.whl (87.1 kB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file sulci-0.5.3.tar.gz.

File metadata

Download URL: sulci-0.5.3.tar.gz
Upload date: May 4, 2026
Size: 128.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sulci-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`df7776eb13cdf938bf105bc8122f5592dd3344ab63aa159b33fd9d0bb4f94897`
MD5	`4e16ededfd03804dc6c46ed6ac7cc138`
BLAKE2b-256	`7bc6a2adf5ed7371f302f408ff2f057000852741631e35a1fba268f528849aa9`

See more details on using hashes here.

File details

Details for the file sulci-0.5.3-py3-none-any.whl.

File metadata

Download URL: sulci-0.5.3-py3-none-any.whl
Upload date: May 4, 2026
Size: 87.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sulci-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`392f1d9168d94c2dbc8935f273cf13a7611c270a60c7c5fe27afa0e70abed436`
MD5	`fe7365577e9f4fd740363bd3dd34e9dc`
BLAKE2b-256	`8186b8b218b4dc6baf91553a4f1b4b2d77f218f926011757756747a0b686a343`

See more details on using hashes here.

sulci 0.5.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Sulci Cache

Why Sulci Cache

Install

LangChain Integration

LlamaIndex Integration

AsyncCache — non-blocking async wrapper

Sulci Cloud — zero infrastructure option

sulci.connect()

Quickstart

Stateless (v0.1 style)

Context-aware (v0.2 style)

Drop-in with cached_call

API Reference

Constructor

Methods

v0.5.0 additions

v0.5.2 additions

v0.5.3 additions

Context-Aware Blending

Backends

Embedding Models

Project Structure

Running Tests

Make targets

Examples

Benchmark

Troubleshooting

ImportError: cannot import name 'HfFolder' from 'huggingface_hub'

huggingface/tokenizers: The current process just got forked... warning

anthropic.OverloadedError: Error code: 529

zsh: no matches found: sulci[chroma]

pytest: command not found

Contributing

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Drop-in with `cached_call`

`ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`

`huggingface/tokenizers: The current process just got forked...` warning

`anthropic.OverloadedError: Error code: 529`

`zsh: no matches found: sulci[chroma]`

`pytest: command not found`