LLMCache is an open-source caching solution designed to operate seamlessly within your cloud infrastructure.
Project description
LLMCache is an open-source caching solution designed to operate seamlessly within your cloud infrastructure, offering custom database integrations and more than just caching. With robust features including automatic retries for enhanced reliability, it empowers developers to efficiently manage data-intensive operations.
⭐ Features
- Customizable Database Integrations: Out-of-the-box support for Firestore and local cache, with the flexibility for custom database integrations to meet your specific needs.
- Efficient Caching: Optimizes your application by caching function responses, significantly reducing calling times and improving responsiveness.
- Retry Logic with Backoff Intervals: Enhances reliability through robust retry mechanisms, including configurable backoff intervals to handle failures gracefully.
- Reduced Latency: Minimizes delays in data retrieval, ensuring your application runs smoothly and efficiently.
- Scalable and Cloud-Ready: Designed to seamlessly integrate with your cloud infrastructure, making it ideal for scaling applications.
- Open Source: Provides the freedom to modify, extend, and tailor the solution to your project's requirements, backed by a community-driven support system.
- Simplified Data Handling: Streamlines the process of storing and retrieving data, allowing for more focused development on core functionalities.
For feature requests, please reach out to us via this form.
🗄️ Supported Database Integrations
Cache Integration Type | Status |
---|---|
Local Cache | Completed ✅ |
Firestore | Completed ✅ |
MongoDB | In Progress 🚧 |
Redis | In Progress 🚧 |
⚙️ Installation
Install via pip with just one command:
pip install llm_cache
🎉 Usage
LLMCache simplifies integrating efficient caching mechanisms into your applications, supporting both local and cloud-based environments. Below are demonstrations of how LLMCache can be utilized with local cache and Firestore, including support for streaming data.
Local Cache
For applications requiring rapid access without external dependencies, LLMCache offers a local caching solution:
from db_integrations.local_cache import LocalCache
from llm_cache import LLMCache
import openai
def call_openai(model,
openai_messages,
temperature,
timeout=50):
client = openai.Client()
completion = client.chat.completions.create(model=model,
messages=openai_messages,
temperature=temperature,
timeout=timeout)
return completion.choices[0].message.content
# Initialize the local cache
cache_file_path = "test_cache.json"
local_cache = LocalCache(file_path=cache_file_path)
llm_cache = LLMCache(local_cache)
# Example function call using the local cache
res = llm_cache.call(func=call_openai,
model="gpt-4",
openai_messages=[{"content": "Hello, how are you?", "role": "user"}],
temperature=0.8,
exclude_cache_params=["timeout"])
Firestore Cache
from db_integrations.firestore_cache import FirestoreCache
from llm_cache import LLMCache
import openai
def call_openai(model,
openai_messages,
temperature,
timeout=50):
client = openai.Client()
completion = client.chat.completions.create(model=model,
messages=openai_messages,
temperature=temperature,
timeout=timeout)
return completion.choices[0].message.content
# Configure Firestore cache
collection_name = "test_cache"
firestore_service_account_file = "firestore_key.json"
firestore_cache = FirestoreCache(collection_name=collection_name,
firestore_service_account_file=firestore_service_account_file)
llm_cache = LLMCache(firestore_cache)
# Utilizing Firestore for caching
res = llm_cache.call(func=call_openai,
model="gpt-4",
openai_messages=[{"content": "Hello, how are you?", "role": "user"}],
temperature=0.8,
timeout=40,
exclude_cache_params=["timeout"],
num_retries_call=2,
backoff_intervals_call=[5, 10])
Note on Firestore Cache: When using Firestore as your caching solution, it's recommended to implement a Time-To-Live (TTL) policy for your cache entries. This ensures that your database does not indefinitely grow with stale data, which can lead to increased costs and decreased performance. Setting a TTL allows Firestore to automatically delete entries after a specified duration, keeping your database optimized and your costs in check.
Streaming Support
LLMCache also supports streaming responses for real-time data handling, enhancing applications that require continuous data flow:
def openai_stream_call(model,
openai_messages,
temperature,
timeout=120):
client = openai.Client()
streaming_response = client.chat.completions.create(model=model,
messages=openai_messages,
temperature=temperature,
stream=True,
timeout=timeout)
# Streaming call demonstration with Firestore cache
streaming_messages = [
{"role": "system", "content": "The following is a conversation with an AI assistant."},
{"role": "user", "content": "Hello, how are you today?"}
]
streaming_generator = llm_cache.stream_call(
func=openai_stream_call,
model="gpt-4",
openai_messages=streaming_messages,
temperature=0.8)
# Iterate over and print streaming responses
for chunk in streaming_generator:
print(chunk.decode("utf-8"))
❓ Frequently Asked Questions
What types of applications can benefit from LLMCache?
LLMCache is particularly beneficial for applications that frequently interact with large language models (LLMs) or perform data-intensive operations. It's ideal for improving response times and reducing API costs in AI-powered apps, web services, and data analysis tools.
Can LLMCache be integrated with any database?
While LLMCache currently has built-in support for Firestore and local caching, it's designed to allow custom database integrations. Developers can extend LLMCache to work with databases like MongoDB and Redis, as well as others according to their project requirements.
Is LLMCache open-source?
Yes, LLMCache is an open-source project. This means you can freely use, modify, and distribute it under its licensing terms. It also allows the community to contribute to its development.
Can LLMCache improve the performance of my application?
Absolutely. By caching responses, LLMCache can significantly reduce the time it takes for your application to respond to user requests. This results in faster performance, especially for operations that involve heavy computational work or external API calls.
👥 Contributions
We welcome contributions from everyone who is looking to improve or add value to this project! Whether you're interested in fixing bugs, adding new features, or improving documentation, your help is appreciated.
If you would like to contribute, please fill out this form with your details and how you'd like to help. We'll review your submission and get back to you as soon as possible with the next steps.
Thank you for considering to contribute, and we look forward to collaborating with you!
🔔 Subscribe to updates
Stay updated with YOL by subscribing through this form
⚖️ License
For detailed licensing information, please refer to the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llm_cache_test-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bf6fb454e10c52bbdf372c6d0b4adef40e08ce0d5403580eef5b0a7c4dadd91 |
|
MD5 | 65568f54a82a43d5aca3649dec2514fa |
|
BLAKE2b-256 | 0960cf3970076d818e2b03d4553bffff69fee69a036a17f32a20234573e68685 |