Skip to main content

A robust middleware for hybrid semantic caching, text normalization, and vector search optimization.

Project description

Hybrid Semantic Cache Middleware

A robust middleware designed to optimize Large Language Model (LLM) query processing through hybrid semantic caching and advanced text normalization.

Built to accelerate responses from cloud LLMs (like Google Gemini 1.5 Flash), this library intercepts typo-heavy or colloquial queries, normalizes them, and retrieves cached responses locally using FAISS and all-MiniLM-L6-v2 embeddings. This architecture significantly cuts down cloud latency and reduces API usage costs.

🚀 Key Features

  • Query Normalization: Automatically handles typos and slang terms before embedding, increasing cache hit rates.
  • Local Vector Caching: Utilizes FAISS for lightning-fast similarity search and retrieval.
  • LLM Latency Reduction: Bypasses cloud LLM API calls for recurring or semantically similar queries.
  • FastAPI Ready: Designed to be easily integrated into modern asynchronous Python backend systems.

📦 Installation

Install the package directly via pip:

pip install hybrid-semantic-cache


💻 Prerequisites
Python 3.8 or higher.

Google Gemini API Key (if using the default fallback LLM).

Set your API key as an environment variable before running your application:

# Windows
set GEMINI_API_KEY="YOUR_API_KEY"

# Mac/Linux
export GEMINI_API_KEY="YOUR_API_KEY"


🛠️ Quick Start
Here is a basic example of how to integrate the middleware into your existing Python application:

import os
# Note: Adjust the import statements below based on the actual classes in the package
from hybrid_semantic_cache.main import app 
from hybrid_semantic_cache.normalizer import normalize_text

# Example: Processing a typo-heavy user query
user_query = "Tlong bkinin srt resign dong" 

# 1. The middleware normalizes the text
clean_query = normalize_text(user_query)

# 2. Checks the local FAISS cache for a semantic match using all-MiniLM-L6-v2
# 3. Falls back to the Cloud LLM (Gemini) ONLY if no local match is found, saving latency and cost.


📄 License
This project is licensed under the MIT License.
Wah, format ini memang sangat khas untuk lampiran dokumen tugas akhir! Sangat rapi, terstruktur, dan sempurna untuk diserahkan kepada dosen penguji.

Namun, karena *middleware* ini sekarang sudah "naik kelas" menjadi *library* publik di ekosistem PyPI yang akan menjadi etalase utamamu di hadapan perekrut Wellfound, kita harus merombak total pendekatannya.

Ada tiga alasan utama mengapa README ini harus diubah:

1. **Bahasa:** Klien global tidak memahami bahasa Indonesia. Kita harus menggunakan bahasa Inggris bernada profesional.
2. **Cara Kerja Ekosistem PyPI:** Pengguna *library*-mu tidak lagi men-*download* folder `evaluasi` atau menjalankan `uvicorn` dari komputermu. Mereka menginstal *package*-mu ke dalam proyek mereka sendiri menggunakan `pip install`.
3. **Penghapusan Konteks Akademik:** Kata "Skripsi", "Bab 4", dan instruksi simulasi pengujian harus dihilangkan. Klien hanya peduli pada apa masalah yang bisa diselesaikan oleh kodemu (dalam hal ini: memotong latensi LLM dan menghemat biaya API).

Berikut adalah draf pembaruan `README.md` berstandar industri. **Silakan *copy-paste* teks di bawah ini ke dalam file `README.md` kamu:**

---

```markdown
# Hybrid Semantic Cache Middleware

A robust middleware designed to optimize Large Language Model (LLM) query processing through hybrid semantic caching and advanced text normalization. 

Built to accelerate responses from cloud LLMs (like Google Gemini 1.5 Flash), this library intercepts typo-heavy or colloquial queries, normalizes them, and retrieves cached responses locally using **FAISS** and `all-MiniLM-L6-v2` embeddings. This architecture significantly cuts down cloud latency and reduces API usage costs.

## 🚀 Key Features
* **Query Normalization:** Automatically handles typos and slang terms before embedding, increasing cache hit rates.
* **Local Vector Caching:** Utilizes FAISS for lightning-fast similarity search and retrieval.
* **LLM Latency Reduction:** Bypasses cloud LLM API calls for recurring or semantically similar queries.
* **FastAPI Ready:** Designed to be easily integrated into modern asynchronous Python backend systems.

## 📦 Installation

Install the package directly via pip:

```bash
pip install hybrid-semantic-cache

💻 Prerequisites

  • Python 3.8 or higher.
  • Google Gemini API Key (if using the default fallback LLM).

Set your API key as an environment variable before running your application:

# Windows
set GEMINI_API_KEY="YOUR_API_KEY"

# Mac/Linux
export GEMINI_API_KEY="YOUR_API_KEY"

🛠️ Quick Start

Here is a basic example of how to integrate the middleware into your existing Python application:

import os
# Note: Adjust the import statements below based on the actual classes in the package
from hybrid_semantic_cache.main import app 
from hybrid_semantic_cache.normalizer import normalize_text

# Example: Processing a typo-heavy user query
user_query = "Tlong bkinin srt resign dong" 

# 1. The middleware normalizes the text
clean_query = normalize_text(user_query)

# 2. Checks the local FAISS cache for a semantic match using all-MiniLM-L6-v2
# 3. Falls back to the Cloud LLM (Gemini) ONLY if no local match is found, saving latency and cost.

📄 License

This project is licensed under the MIT License.


***

**Catatan Penting Sebelum Update:** 
Pada bagian `Quick Start` di atas, pastikan kamu menyesuaikan baris `from hybrid_semantic_cache... import ...` dengan nama *class* atau fungsi asli yang kamu buat di dalam file `main.py` atau `normalizer.py` milikmu, agar orang yang mencoba *copy-paste* kodenya tidak mengalami *error*.

Setelah file `README.md` ini kamu simpan, apakah kamu sudah siap untuk mempraktikkan proses *update* versi ke `0.1.1` di `pyproject.toml` dan meluncurkannya ulang ke PyPI lewat terminal?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hybrid_semantic_cache-0.1.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hybrid_semantic_cache-0.1.1-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file hybrid_semantic_cache-0.1.1.tar.gz.

File metadata

  • Download URL: hybrid_semantic_cache-0.1.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for hybrid_semantic_cache-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9c0c3123185ca2d730a46f12e6f182e752dd73cd5e0bf30bd289fcc5a0296ab6
MD5 55592ff94c30b58cdacb722ada67f3fd
BLAKE2b-256 1cb9b6d5dd828f837f256f09671fb6137728fd943696f2875945bf35e68498d1

See more details on using hashes here.

File details

Details for the file hybrid_semantic_cache-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for hybrid_semantic_cache-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 016efa8cef85b94559d635e91ffe01f12c18901bc3d7b92da99da1dfeb177462
MD5 cc09340280bc706c8a4de716e629bb88
BLAKE2b-256 61333eefca2b232051ec0b74d769537072bf86e8285daba84a8bab8cfda191bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page