A robust middleware for hybrid semantic caching, text normalization, and vector search optimization.
Project description
Hybrid Semantic Cache Middleware
A robust middleware designed to optimize Large Language Model (LLM) query processing through hybrid semantic caching and advanced text normalization.
Built to accelerate responses from cloud LLMs (like Google Gemini 1.5 Flash), this library intercepts typo-heavy or colloquial queries, normalizes them, and retrieves cached responses locally using FAISS and all-MiniLM-L6-v2 embeddings. This architecture significantly cuts down cloud latency and reduces API usage costs.
🚀 Key Features
- Query Normalization: Automatically handles typos and slang terms before embedding, increasing cache hit rates.
- Local Vector Caching: Utilizes FAISS for lightning-fast similarity search and retrieval.
- LLM Latency Reduction: Bypasses cloud LLM API calls for recurring or semantically similar queries.
- FastAPI Ready: Designed to be easily integrated into modern asynchronous Python backend systems.
📦 Installation
Install the package directly via pip:
pip install hybrid-semantic-cache
💻 Prerequisites
Python 3.8 or higher.
Google Gemini API Key (if using the default fallback LLM).
Set your API key as an environment variable before running your application:
# Windows
set GEMINI_API_KEY="YOUR_API_KEY"
# Mac/Linux
export GEMINI_API_KEY="YOUR_API_KEY"
🛠️ Quick Start
Here is a basic example of how to integrate the middleware into your existing Python application:
import os
# Note: Adjust the import statements below based on the actual classes in the package
from hybrid_semantic_cache.main import app
from hybrid_semantic_cache.normalizer import normalize_text
# Example: Processing a typo-heavy user query
user_query = "Tlong bkinin srt resign dong"
# 1. The middleware normalizes the text
clean_query = normalize_text(user_query)
# 2. Checks the local FAISS cache for a semantic match using all-MiniLM-L6-v2
# 3. Falls back to the Cloud LLM (Gemini) ONLY if no local match is found, saving latency and cost.
📄 License
This project is licensed under the MIT License.
Wah, format ini memang sangat khas untuk lampiran dokumen tugas akhir! Sangat rapi, terstruktur, dan sempurna untuk diserahkan kepada dosen penguji.
Namun, karena *middleware* ini sekarang sudah "naik kelas" menjadi *library* publik di ekosistem PyPI yang akan menjadi etalase utamamu di hadapan perekrut Wellfound, kita harus merombak total pendekatannya.
Ada tiga alasan utama mengapa README ini harus diubah:
1. **Bahasa:** Klien global tidak memahami bahasa Indonesia. Kita harus menggunakan bahasa Inggris bernada profesional.
2. **Cara Kerja Ekosistem PyPI:** Pengguna *library*-mu tidak lagi men-*download* folder `evaluasi` atau menjalankan `uvicorn` dari komputermu. Mereka menginstal *package*-mu ke dalam proyek mereka sendiri menggunakan `pip install`.
3. **Penghapusan Konteks Akademik:** Kata "Skripsi", "Bab 4", dan instruksi simulasi pengujian harus dihilangkan. Klien hanya peduli pada apa masalah yang bisa diselesaikan oleh kodemu (dalam hal ini: memotong latensi LLM dan menghemat biaya API).
Berikut adalah draf pembaruan `README.md` berstandar industri. **Silakan *copy-paste* teks di bawah ini ke dalam file `README.md` kamu:**
---
```markdown
# Hybrid Semantic Cache Middleware
A robust middleware designed to optimize Large Language Model (LLM) query processing through hybrid semantic caching and advanced text normalization.
Built to accelerate responses from cloud LLMs (like Google Gemini 1.5 Flash), this library intercepts typo-heavy or colloquial queries, normalizes them, and retrieves cached responses locally using **FAISS** and `all-MiniLM-L6-v2` embeddings. This architecture significantly cuts down cloud latency and reduces API usage costs.
## 🚀 Key Features
* **Query Normalization:** Automatically handles typos and slang terms before embedding, increasing cache hit rates.
* **Local Vector Caching:** Utilizes FAISS for lightning-fast similarity search and retrieval.
* **LLM Latency Reduction:** Bypasses cloud LLM API calls for recurring or semantically similar queries.
* **FastAPI Ready:** Designed to be easily integrated into modern asynchronous Python backend systems.
## 📦 Installation
Install the package directly via pip:
```bash
pip install hybrid-semantic-cache
💻 Prerequisites
- Python 3.8 or higher.
- Google Gemini API Key (if using the default fallback LLM).
Set your API key as an environment variable before running your application:
# Windows
set GEMINI_API_KEY="YOUR_API_KEY"
# Mac/Linux
export GEMINI_API_KEY="YOUR_API_KEY"
🛠️ Quick Start
Here is a basic example of how to integrate the middleware into your existing Python application:
import os
# Note: Adjust the import statements below based on the actual classes in the package
from hybrid_semantic_cache.main import app
from hybrid_semantic_cache.normalizer import normalize_text
# Example: Processing a typo-heavy user query
user_query = "Tlong bkinin srt resign dong"
# 1. The middleware normalizes the text
clean_query = normalize_text(user_query)
# 2. Checks the local FAISS cache for a semantic match using all-MiniLM-L6-v2
# 3. Falls back to the Cloud LLM (Gemini) ONLY if no local match is found, saving latency and cost.
📄 License
This project is licensed under the MIT License.
***
**Catatan Penting Sebelum Update:**
Pada bagian `Quick Start` di atas, pastikan kamu menyesuaikan baris `from hybrid_semantic_cache... import ...` dengan nama *class* atau fungsi asli yang kamu buat di dalam file `main.py` atau `normalizer.py` milikmu, agar orang yang mencoba *copy-paste* kodenya tidak mengalami *error*.
Setelah file `README.md` ini kamu simpan, apakah kamu sudah siap untuk mempraktikkan proses *update* versi ke `0.1.1` di `pyproject.toml` dan meluncurkannya ulang ke PyPI lewat terminal?
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hybrid_semantic_cache-0.1.1.tar.gz.
File metadata
- Download URL: hybrid_semantic_cache-0.1.1.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c0c3123185ca2d730a46f12e6f182e752dd73cd5e0bf30bd289fcc5a0296ab6
|
|
| MD5 |
55592ff94c30b58cdacb722ada67f3fd
|
|
| BLAKE2b-256 |
1cb9b6d5dd828f837f256f09671fb6137728fd943696f2875945bf35e68498d1
|
File details
Details for the file hybrid_semantic_cache-0.1.1-py3-none-any.whl.
File metadata
- Download URL: hybrid_semantic_cache-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
016efa8cef85b94559d635e91ffe01f12c18901bc3d7b92da99da1dfeb177462
|
|
| MD5 |
cc09340280bc706c8a4de716e629bb88
|
|
| BLAKE2b-256 |
61333eefca2b232051ec0b74d769537072bf86e8285daba84a8bab8cfda191bd
|