Drop-in memory wrappers for Anthropic, OpenAI, and Gemini clients
Project description
ZeroLatency
Drop-in memory wrappers for Anthropic, OpenAI, and Gemini clients. Add persistent memory to your LLM applications with zero code changes.
Installation
pip install zerolatency
Features
- Drop-in replacement - No code changes required, just swap the import
- Automatic memory recall - Relevant memories are retrieved and injected into context
- Automatic memory storage - Conversations are stored as memories in the background
- Non-blocking - Memory operations run in background threads, zero latency impact
- Multi-provider - Works with Anthropic Claude, OpenAI GPT, and Google Gemini
Quick Start
Anthropic Claude
from zerolatency import AnthropicWithMemory
# Replace this:
# from anthropic import Anthropic
# client = Anthropic(api_key="your-api-key")
# With this:
client = AnthropicWithMemory(
api_key="your-anthropic-key",
zl_api_key="your-0latency-key",
agent_id="my-agent"
)
# Use exactly as before
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Hello!"}]
)
OpenAI
from zerolatency import OpenAIWithMemory
# Replace this:
# from openai import OpenAI
# client = OpenAI(api_key="your-api-key")
# With this:
client = OpenAIWithMemory(
api_key="your-openai-key",
zl_api_key="your-0latency-key",
agent_id="my-agent"
)
# Use exactly as before
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
Google Gemini
from zerolatency import GeminiWithMemory
# Replace this:
# import google.generativeai as genai
# genai.configure(api_key="your-api-key")
# model = genai.GenerativeModel("gemini-pro")
# With this:
client = GeminiWithMemory(
api_key="your-gemini-key",
zl_api_key="your-0latency-key",
agent_id="my-agent"
)
model = client.GenerativeModel("gemini-pro")
# Use exactly as before
response = model.generate_content("Hello!")
How It Works
- Memory Recall: Before each API call, relevant memories are retrieved using semantic search
- Context Injection: Memories are automatically injected into the system prompt
- API Call: Your request is sent to the LLM provider with enhanced context
- Memory Storage: The conversation turn is stored as a memory (non-blocking, zero latency)
- Response: The original response is returned unmodified
Configuration
All wrappers support the following parameters:
client = AnthropicWithMemory(
api_key="your-llm-api-key", # Required: Your LLM provider API key
zl_api_key="your-0latency-key", # Required: Your 0Latency API key
agent_id="my-agent", # Required: Unique agent identifier
zl_base_url="https://api.0latency.ai", # Optional: 0Latency API base URL
recall_enabled=True, # Optional: Enable/disable memory recall
store_enabled=True, # Optional: Enable/disable memory storage
budget_tokens=4000, # Optional: Max tokens for memory context
)
Get Your API Key
- Sign up at 0latency.ai
- Generate your API key from the dashboard
- Start building with memory!
Examples
Multi-turn conversation with memory
from zerolatency import AnthropicWithMemory
client = AnthropicWithMemory(
api_key="your-anthropic-key",
zl_api_key="your-0latency-key",
agent_id="customer-support-bot"
)
# First conversation
response1 = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "My favorite color is blue"}]
)
# Later conversation - the agent remembers!
response2 = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "What's my favorite color?"}]
)
# Response: "Based on our previous conversation, your favorite color is blue."
Disable memory for specific calls
# Create client with recall disabled
client = AnthropicWithMemory(
api_key="your-anthropic-key",
zl_api_key="your-0latency-key",
agent_id="my-agent",
recall_enabled=False, # Don't recall memories
store_enabled=True, # But still store them
)
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ ZeroLatency Wrapper (This Package) │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Recall │ │ Inject │ │ Store (async) │ │
│ │ Memories │→ │ Context │→ │ Memories │ │
│ └─────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────┴─────────────┐
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ LLM Provider │ │ 0Latency API │
│ (Anthropic/ │ │ (Memory Store) │
│ OpenAI/Gemini) │ └─────────────────┘
└──────────────────┘
Performance
- Zero added latency - Memory storage happens in background threads
- Fast recall - Memory retrieval typically adds <100ms
- Configurable budget - Control memory context size with
budget_tokens - Smart caching - Frequently accessed memories are cached for speed
Requirements
- Python 3.8+
anthropic>=0.18.0openai>=1.0.0google-generativeai>=0.3.0requests>=2.25.0
License
MIT License - see LICENSE file for details
Support
- Documentation: docs.0latency.ai
- Issues: GitHub Issues
- Email: support@0latency.ai
Contributing
Contributions are welcome! Please open an issue or PR on GitHub.
Built with ❤️ by the 0Latency team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zerolatency-0.2.1.tar.gz.
File metadata
- Download URL: zerolatency-0.2.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0c202c73c9aa5d24f6cf3e8ae9eb9c69f1f4ad382c83eb9403f6ceb152e4c7e
|
|
| MD5 |
3b26138101a07db8c49a9db341214823
|
|
| BLAKE2b-256 |
4520ffb56dc56bf209a3b46a99ef645319fdd097191855303c6c3142b28607b1
|
File details
Details for the file zerolatency-0.2.1-py3-none-any.whl.
File metadata
- Download URL: zerolatency-0.2.1-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19b5f6e070d937325c324c6a0bd513e16c2840246e70ddeaa7e295c94d90190c
|
|
| MD5 |
fe4f816ca3992832ac8e6bfeebaa3017
|
|
| BLAKE2b-256 |
f931ffc6eaeaaad21ce5b8e73e5549f0282d5a249c88eb65ccfe71eeb3d02600
|