Skip to main content

High-performance semantic caching for LLM APIs using binary embeddings

Project description

Binary Semantic Cache

Status Version Backend License

Cut LLM costs by 50-90% with sub-millisecond latency.

A high-performance, enterprise-grade semantic cache for OpenAI and local LLMs. Powered by a Rust core for maximum speed, memory efficiency, and fast startup.


⚡ Why Use This?

Most semantic caches are slow (Python-only), heavy (require VectorDB), or complex (require Redis).

Feature Binary Semantic Cache Redis / VectorDB Python (NumPy)
Latency (100k) 0.16 ms ~2-5 ms ~1.14 ms
Memory / Entry ~52 bytes 🪶 ~1-2 KB ~120 bytes
Infrastructure None (Local Lib) External Service None
Persistence Fast Binary I/O Snapshots Slow (Pickle)
Cost Free $$$ Free

Benchmark Source: benchmarks/results/cache_e2e_bench.json (Intel i7, 100k entries).


🚀 Quick Start

1. Installation

Prerequisites: Python 3.10+. Rust is only needed for source builds.

# Option A: From PyPI (Recommended)
pip install "binary-semantic-cache[openai]"

# Option B: From Source (Development)
git clone https://github.com/matte1782/binary_semantic_cache.git
cd binary_semantic_cache
pip install maturin
maturin develop --release --extras openai

2. Choose Your Backend

A. OpenAI (Production)

Best for production apps. Includes automatic rate limiting and cost tracking.

import os
from binary_semantic_cache import BinarySemanticCache, BinaryEncoder
from binary_semantic_cache.embeddings.openai_backend import OpenAIEmbeddingBackend

# 1. Setup Backend (Tier 1 rate limits default)
os.environ["OPENAI_API_KEY"] = "sk-..."
backend = OpenAIEmbeddingBackend(model="text-embedding-3-small")

# 2. Initialize Cache (1536 dimensions for OpenAI)
encoder = BinaryEncoder(embedding_dim=1536, code_bits=256)
cache = BinarySemanticCache(encoder=encoder, max_entries=10000)

# 3. Use
query = "What is the capital of France?"
embedding = backend.embed_text(query)

# Check Cache
if hit := cache.get(embedding):
    print(f"✅ HIT: {hit.response}")
else:
    # Call LLM (Simulated)
    response = "Paris"
    cache.put(embedding, response)
    print(f"❌ MISS: Cached '{response}'")

B. Ollama / Local (Development)

Best for offline development. Zero API costs.

from binary_semantic_cache import BinarySemanticCache, BinaryEncoder
from binary_semantic_cache.embeddings import OllamaEmbedder

# 1. Setup Local Backend (Requires Ollama running with nomic-embed-text)
embedder = OllamaEmbedder(model_name="nomic-embed-text")

# 2. Initialize Cache (768 dimensions for Nomic)
encoder = BinaryEncoder(embedding_dim=768)
cache = BinarySemanticCache(encoder=encoder)

# 3. Use
vec = embedder.embed_text("Hello Local World")
cache.put(vec, "Stored Locally")

📊 Performance

Phase 2.5 introduces a native Rust storage engine, delivering massive gains over the Phase 1 Python baseline.

Latency & Throughput (100k entries)

Metric Phase 1 (Python) Phase 2 (Rust) Speedup
Mean Latency 1.14 ms 0.16 ms 7.0x 🚀
Hit Latency ~0.10 ms 0.05 ms 2.0x
Miss Latency ~1.20 ms 0.30 ms 4.0x

Memory Efficiency

Component Size Notes
Rust Index 44 bytes Fixed (Code + Metadata)
Python Response ~8 bytes Pointer to stored object
Total / Entry ~52 bytes vs ~120 bytes (Python)

Note: Actual memory usage depends on the size of your response strings. The cache overhead itself is minimal.


🏗️ Architecture

The cache uses a hybrid Python/Rust architecture to combine ease of use with systems-level performance.

graph LR
    A[User App] -->|Python API| B(BinarySemanticCache)
    B -->|Embed| C{Backend}
    C -->|OpenAI/Ollama| D[Embeddings]
    B -->|Search| E[Rust Core 🦀]
    E -->|SIMD Hamming| F[Binary Index]
    E -->|Results| B

Persistence V3 (Dual-File Format)

Persistence is handled by a split-file strategy ensuring fast loading regardless of cache size:

  1. entries.bin: A memory-mappable binary file containing compressed codes, timestamps, and access counts.
    • Index Load Time: < 10ms for 1M entries (search-ready).
    • Full Load Time: ~300ms for 1M entries (includes response hydration).
  2. responses.pkl: A standard Python pickle file for storing arbitrary response objects (strings, dicts, JSON).
    • Integrity: Secured with SHA-256 checksums.

⚙️ Configuration

BinarySemanticCache(encoder, max_entries=..., ...)

Parameter Default Description
max_entries 1000 Maximum items before LRU eviction.
similarity_threshold 0.80 Cosine similarity threshold (0.0-1.0). Lower = more hits, higher = precise.
code_bits 256 Size of binary hash. Fixed at 256 for v1.0.0.
storage_mode "memory" Currently memory-only (with disk persistence).

⚠️ Limitations & Constraints

For a detailed breakdown, see Known Limitations (v1.0).

  • Linear Scan (O(N)): This is not an Approximate Nearest Neighbor (ANN) index (like FAISS/HNSW). It performs a full linear scan.
    • Implication: Extremely fast for N < 1M (Rust SIMD), but scales linearly.
  • Full Load Time: While the index loads instantly, full hydration of 1M+ response objects takes ~300ms due to Python pickle overhead.
  • Memory Resident: The entire index lives in RAM.
    • Implication: 1M entries requires ~50MB RAM + Response Data.
  • Global Lock: Uses a global RLock for thread safety.
    • Implication: Concurrent writes are serialized.
  • Rust Dependency: You must be able to build Rust extensions to install this library from source (no pre-built wheels yet).

🗺️ Roadmap (Phase 3)

  • Cloud Persistence: S3 / GCS adapters for serverless deployments.
  • Distributed Cache: Redis-backed shared state for multi-instance setups.
  • Approximate Search: Evaluation of HNSW for >1M entry scaling.

🤝 Contributing

We welcome contributions! Please ensure you run the full benchmark suite before submitting PRs.

License: MIT

Maintained by Matteo Panzeri.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binary_semantic_cache-1.0.1.tar.gz (53.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

binary_semantic_cache-1.0.1-cp313-cp313-win_amd64.whl (221.4 kB view details)

Uploaded CPython 3.13Windows x86-64

binary_semantic_cache-1.0.1-cp312-cp312-win_amd64.whl (223.3 kB view details)

Uploaded CPython 3.12Windows x86-64

binary_semantic_cache-1.0.1-cp312-cp312-musllinux_1_1_x86_64.whl (339.0 kB view details)

Uploaded CPython 3.12musllinux: musl 1.1+ x86-64

binary_semantic_cache-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (304.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

binary_semantic_cache-1.0.1-cp312-cp312-macosx_11_0_arm64.whl (276.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

binary_semantic_cache-1.0.1-cp312-cp312-macosx_10_9_x86_64.whl (291.4 kB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

binary_semantic_cache-1.0.1-cp311-cp311-win_amd64.whl (222.3 kB view details)

Uploaded CPython 3.11Windows x86-64

binary_semantic_cache-1.0.1-cp311-cp311-musllinux_1_1_x86_64.whl (338.3 kB view details)

Uploaded CPython 3.11musllinux: musl 1.1+ x86-64

binary_semantic_cache-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (303.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

binary_semantic_cache-1.0.1-cp311-cp311-macosx_11_0_arm64.whl (276.2 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

binary_semantic_cache-1.0.1-cp311-cp311-macosx_10_9_x86_64.whl (291.6 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

binary_semantic_cache-1.0.1-cp310-cp310-win_amd64.whl (222.5 kB view details)

Uploaded CPython 3.10Windows x86-64

binary_semantic_cache-1.0.1-cp310-cp310-musllinux_1_1_x86_64.whl (338.4 kB view details)

Uploaded CPython 3.10musllinux: musl 1.1+ x86-64

binary_semantic_cache-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (303.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

binary_semantic_cache-1.0.1-cp310-cp310-macosx_11_0_arm64.whl (276.2 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

binary_semantic_cache-1.0.1-cp310-cp310-macosx_10_9_x86_64.whl (291.6 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

File details

Details for the file binary_semantic_cache-1.0.1.tar.gz.

File metadata

  • Download URL: binary_semantic_cache-1.0.1.tar.gz
  • Upload date:
  • Size: 53.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for binary_semantic_cache-1.0.1.tar.gz
Algorithm Hash digest
SHA256 9d53db219dc6ab4702350588a94bf7c275c61cbccfc3d72b4e4da6862362950f
MD5 a32d22d40bb1124e636a1f4f54b83381
BLAKE2b-256 b874c68feeb2f81717f45324a25bce68ca08992949a855410412ac7aef5155bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1.tar.gz:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 6745b90bfa1bac4e29ee733af11bec11e79783e0c5c9f6470b373ab6d4f61df7
MD5 594725ed6dd6e04981bb007f2f677efb
BLAKE2b-256 84cbb09d62a7a7c06503aaf1e499c5768b011c7c82b82d0c3c71666f88e1c7c0

See more details on using hashes here.

File details

Details for the file binary_semantic_cache-1.0.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5410ea2b66cff845f8f9051f30abecad19e2b18f029bd94409c5dd0c52296f9e
MD5 40eca8374627ba68aa1dcde0c761764b
BLAKE2b-256 5eef1a1ab631ae7f83bbee9d3a13ea3e0227b7cbd84452d376ee1dd2af446256

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp312-cp312-win_amd64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp312-cp312-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 c15b2a74240f8cbd7001e1a0a75538b26fefe27ab1379221e19895c08ff17447
MD5 c8c4213b2b08cd0a4e2d5c986951e5db
BLAKE2b-256 9a4f641a87e325108a4945ca11e1372c8d5e3f61215aa8e5a5d36926696fd1c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp312-cp312-musllinux_1_1_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a728d45bff932e92123b31af360baa1fcb17e3efd031dbcf6312c6495b7bc460
MD5 c495f8cbde1bf278ae7e84bcca1676d4
BLAKE2b-256 adb72206ad2715b446cce871d3820e792bc436a000728fd6a3999fe25cd9c1ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5543607915512946799bd466fd419c4dcb06be98fc33b2619abc5179e2287373
MD5 c6914a2ca37d932bdb1e461d6f7c67af
BLAKE2b-256 c9fb85a24d02b36332b9fdb0e137778d879a305e2ebca87c88faac59181c4507

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 37d2ac7ac063c19cbed0ae96b2b319d980b449b9384a9279421a88c2c3426ec6
MD5 9ba1167564f7080c36391ef65bf520e0
BLAKE2b-256 0558a3b55be1d5ca25851de6b0e86b3d1a3f38e421766e4ebe828f2d33990144

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp312-cp312-macosx_10_9_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7c17d05f24d9ddb4782a28d8c3f4129205385f64fe3cc06d0f46406fb2c439bb
MD5 5daccef6fb35f3ca231eb7cd3622ea00
BLAKE2b-256 8d480d9fac361fafa6c1678ec3c24d9583fd87f0a72f0d492038bd0ee660e89c

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp311-cp311-win_amd64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 e866e40ec052ba58f209b6abb527ac00596fef0e8c86e5962baf0cc77a0b746a
MD5 1fd8ff10f5dca81e2cdfcd1722cdfffe
BLAKE2b-256 8b14ce37886d1fa3db64a5db70c482e148c0704a32110599536451f5e4f313ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp311-cp311-musllinux_1_1_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 da71fea70be638a27986ba1008ad21742d60a3230089cc544979be95038e6d1f
MD5 eab8bbf24fc7eaef630c61ee227a6190
BLAKE2b-256 2844a215ab946ac5e4cd48494ffaaacf302f63696b2e3adae16e204cdcedded6

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 022a1006068284c239c6217a2f614ee1be27896e112e50c2fcec787766c19013
MD5 af7e06898a5043bb7ac8d9c87279da4d
BLAKE2b-256 c51cd5d3644b7008ab4cba517d718d960e922bd765492b2b0699b3dec95f6a31

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bd5a23b7facfd9e8c19b2b61ab8b5b8bc03422dd387c5965954bf4e443320052
MD5 334267a587b8d70b116b186f6ffd609c
BLAKE2b-256 ae8d6be5329fe832df8ef5e82711ef3b0ae84ff612f9eb5bccb6cece0b108b7b

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp311-cp311-macosx_10_9_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 97b0feff64b55bf8601923dbb204c6341c23fc578adf7ee811df7dfe351111d2
MD5 b88802e3db7bd02d0be9894761bd3ca8
BLAKE2b-256 c5c24889dbbea37f800b863be60880bd005e262f0eb30b14d94162b235ee69e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp310-cp310-win_amd64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 6778f45dc439d294ea9ce6e7056101e2e268791f37e964c11acaab5bda9e8bea
MD5 0a611713ad1eca7fffbe99ec09daf44b
BLAKE2b-256 bc1d31a944f1a7fb3d810135c43a7dde00081662dc6025716a84e29ae7c079be

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp310-cp310-musllinux_1_1_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3d579049835c32dfa3d8d1d0aa57f784d480d6215903854e3d783e54a856283b
MD5 447270d6b7b9ba7ad7e40249d7ac040b
BLAKE2b-256 a86b1d7d3bfae6cd070f4ec852627ce4c7e365c90e529a09a79f16d58474aa60

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 57cd89a0852536ab6a61eef40fbea5b51c5c79387cda3947c92d02e9b2de4827
MD5 ca32782cee7153e6423430836758190e
BLAKE2b-256 80001f8509f18ad02cee30097b1ff8c8bd2025398b565ba40f22302cfd4cb207

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binary_semantic_cache-1.0.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for binary_semantic_cache-1.0.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 176b501589140a4ac1cea9777e43e8e65a6ec58a67f20099caae14689c1cf4d1
MD5 b969b7c2bc5c7bce98ead153a506636a
BLAKE2b-256 e086c5b988daa0d442042d5eb30fad66a30eb4c16c10fc3e7c00caf64ca59bfd

See more details on using hashes here.

Provenance

The following attestation bundles were made for binary_semantic_cache-1.0.1-cp310-cp310-macosx_10_9_x86_64.whl:

Publisher: release.yml on matte1782/binary_semantic_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page