Skip to main content

Drop-in memory wrappers for Anthropic, OpenAI, and Gemini clients

Project description

ZeroLatency

Drop-in memory wrappers for Anthropic, OpenAI, and Gemini clients. Add persistent memory to your LLM applications with zero code changes.

Installation

pip install zerolatency

Features

  • Drop-in replacement - No code changes required, just swap the import
  • Automatic memory recall - Relevant memories are retrieved and injected into context
  • Automatic memory storage - Conversations are stored as memories in the background
  • Non-blocking - Memory operations run in background threads, zero latency impact
  • Multi-provider - Works with Anthropic Claude, OpenAI GPT, and Google Gemini

Quick Start

Anthropic Claude

from zerolatency import AnthropicWithMemory

# Replace this:
# from anthropic import Anthropic
# client = Anthropic(api_key="your-api-key")

# With this:
client = AnthropicWithMemory(
    api_key="your-anthropic-key",
    zl_api_key="your-0latency-key",
    agent_id="my-agent"
)

# Use exactly as before
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Hello!"}]
)

OpenAI

from zerolatency import OpenAIWithMemory

# Replace this:
# from openai import OpenAI
# client = OpenAI(api_key="your-api-key")

# With this:
client = OpenAIWithMemory(
    api_key="your-openai-key",
    zl_api_key="your-0latency-key",
    agent_id="my-agent"
)

# Use exactly as before
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Google Gemini

from zerolatency import GeminiWithMemory

# Replace this:
# import google.generativeai as genai
# genai.configure(api_key="your-api-key")
# model = genai.GenerativeModel("gemini-pro")

# With this:
client = GeminiWithMemory(
    api_key="your-gemini-key",
    zl_api_key="your-0latency-key",
    agent_id="my-agent"
)
model = client.GenerativeModel("gemini-pro")

# Use exactly as before
response = model.generate_content("Hello!")

How It Works

  1. Memory Recall: Before each API call, relevant memories are retrieved using semantic search
  2. Context Injection: Memories are automatically injected into the system prompt
  3. API Call: Your request is sent to the LLM provider with enhanced context
  4. Memory Storage: The conversation turn is stored as a memory (non-blocking, zero latency)
  5. Response: The original response is returned unmodified

Configuration

All wrappers support the following parameters:

client = AnthropicWithMemory(
    api_key="your-llm-api-key",          # Required: Your LLM provider API key
    zl_api_key="your-0latency-key",      # Required: Your 0Latency API key
    agent_id="my-agent",                 # Required: Unique agent identifier
    zl_base_url="https://api.0latency.ai",  # Optional: 0Latency API base URL
    recall_enabled=True,                 # Optional: Enable/disable memory recall
    store_enabled=True,                  # Optional: Enable/disable memory storage
    budget_tokens=4000,                  # Optional: Max tokens for memory context
)

Get Your API Key

  1. Sign up at 0latency.ai
  2. Generate your API key from the dashboard
  3. Start building with memory!

Examples

Multi-turn conversation with memory

from zerolatency import AnthropicWithMemory

client = AnthropicWithMemory(
    api_key="your-anthropic-key",
    zl_api_key="your-0latency-key",
    agent_id="customer-support-bot"
)

# First conversation
response1 = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "My favorite color is blue"}]
)

# Later conversation - the agent remembers!
response2 = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "What's my favorite color?"}]
)
# Response: "Based on our previous conversation, your favorite color is blue."

Disable memory for specific calls

# Create client with recall disabled
client = AnthropicWithMemory(
    api_key="your-anthropic-key",
    zl_api_key="your-0latency-key",
    agent_id="my-agent",
    recall_enabled=False,  # Don't recall memories
    store_enabled=True,    # But still store them
)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Your Application                         │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│               ZeroLatency Wrapper (This Package)             │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │   Recall    │  │   Inject     │  │  Store (async)   │  │
│  │  Memories   │→ │   Context    │→ │    Memories      │  │
│  └─────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            │
              ┌─────────────┴─────────────┐
              ▼                           ▼
    ┌──────────────────┐        ┌─────────────────┐
    │  LLM Provider    │        │  0Latency API   │
    │  (Anthropic/     │        │  (Memory Store) │
    │   OpenAI/Gemini) │        └─────────────────┘
    └──────────────────┘

Performance

  • Zero added latency - Memory storage happens in background threads
  • Fast recall - Memory retrieval typically adds <100ms
  • Configurable budget - Control memory context size with budget_tokens
  • Smart caching - Frequently accessed memories are cached for speed

Requirements

  • Python 3.8+
  • anthropic>=0.18.0
  • openai>=1.0.0
  • google-generativeai>=0.3.0
  • requests>=2.25.0

License

MIT License - see LICENSE file for details

Support

Contributing

Contributions are welcome! Please open an issue or PR on GitHub.


Built with ❤️ by the 0Latency team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zerolatency-0.2.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zerolatency-0.2.1-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file zerolatency-0.2.1.tar.gz.

File metadata

  • Download URL: zerolatency-0.2.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zerolatency-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d0c202c73c9aa5d24f6cf3e8ae9eb9c69f1f4ad382c83eb9403f6ceb152e4c7e
MD5 3b26138101a07db8c49a9db341214823
BLAKE2b-256 4520ffb56dc56bf209a3b46a99ef645319fdd097191855303c6c3142b28607b1

See more details on using hashes here.

File details

Details for the file zerolatency-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: zerolatency-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zerolatency-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 19b5f6e070d937325c324c6a0bd513e16c2840246e70ddeaa7e295c94d90190c
MD5 fe4f816ca3992832ac8e6bfeebaa3017
BLAKE2b-256 f931ffc6eaeaaad21ce5b8e73e5549f0282d5a249c88eb65ccfe71eeb3d02600

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page