Drop-in memory wrappers for Anthropic, OpenAI, and Gemini clients

These details have not been verified by PyPI

Project links

Project description

ZeroLatency

Drop-in memory wrappers for Anthropic, OpenAI, and Gemini clients. Add persistent memory to your LLM applications with zero code changes.

Installation

pip install zerolatency

Features

Drop-in replacement - No code changes required, just swap the import
Automatic memory recall - Relevant memories are retrieved and injected into context
Automatic memory storage - Conversations are stored as memories in the background
Non-blocking - Memory operations run in background threads, zero latency impact
Multi-provider - Works with Anthropic Claude, OpenAI GPT, and Google Gemini

Quick Start

Anthropic Claude

from zerolatency import AnthropicWithMemory

# Replace this:
# from anthropic import Anthropic
# client = Anthropic(api_key="your-api-key")

# With this:
client = AnthropicWithMemory(
    api_key="your-anthropic-key",
    zl_api_key="your-0latency-key",
    agent_id="my-agent"
)

# Use exactly as before
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Hello!"}]
)

OpenAI

from zerolatency import OpenAIWithMemory

# Replace this:
# from openai import OpenAI
# client = OpenAI(api_key="your-api-key")

# With this:
client = OpenAIWithMemory(
    api_key="your-openai-key",
    zl_api_key="your-0latency-key",
    agent_id="my-agent"
)

# Use exactly as before
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Google Gemini

from zerolatency import GeminiWithMemory

# Replace this:
# import google.generativeai as genai
# genai.configure(api_key="your-api-key")
# model = genai.GenerativeModel("gemini-pro")

# With this:
client = GeminiWithMemory(
    api_key="your-gemini-key",
    zl_api_key="your-0latency-key",
    agent_id="my-agent"
)
model = client.GenerativeModel("gemini-pro")

# Use exactly as before
response = model.generate_content("Hello!")

How It Works

Memory Recall: Before each API call, relevant memories are retrieved using semantic search
Context Injection: Memories are automatically injected into the system prompt
API Call: Your request is sent to the LLM provider with enhanced context
Memory Storage: The conversation turn is stored as a memory (non-blocking, zero latency)
Response: The original response is returned unmodified

Configuration

All wrappers support the following parameters:

client = AnthropicWithMemory(
    api_key="your-llm-api-key",          # Required: Your LLM provider API key
    zl_api_key="your-0latency-key",      # Required: Your 0Latency API key
    agent_id="my-agent",                 # Required: Unique agent identifier
    zl_base_url="https://api.0latency.ai",  # Optional: 0Latency API base URL
    recall_enabled=True,                 # Optional: Enable/disable memory recall
    store_enabled=True,                  # Optional: Enable/disable memory storage
    budget_tokens=4000,                  # Optional: Max tokens for memory context
)

Get Your API Key

Sign up at 0latency.ai
Generate your API key from the dashboard
Start building with memory!

Examples

Multi-turn conversation with memory

from zerolatency import AnthropicWithMemory

client = AnthropicWithMemory(
    api_key="your-anthropic-key",
    zl_api_key="your-0latency-key",
    agent_id="customer-support-bot"
)

# First conversation
response1 = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "My favorite color is blue"}]
)

# Later conversation - the agent remembers!
response2 = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "What's my favorite color?"}]
)
# Response: "Based on our previous conversation, your favorite color is blue."

Disable memory for specific calls

# Create client with recall disabled
client = AnthropicWithMemory(
    api_key="your-anthropic-key",
    zl_api_key="your-0latency-key",
    agent_id="my-agent",
    recall_enabled=False,  # Don't recall memories
    store_enabled=True,    # But still store them
)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Your Application                         │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│               ZeroLatency Wrapper (This Package)             │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │   Recall    │  │   Inject     │  │  Store (async)   │  │
│  │  Memories   │→ │   Context    │→ │    Memories      │  │
│  └─────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            │
              ┌─────────────┴─────────────┐
              ▼                           ▼
    ┌──────────────────┐        ┌─────────────────┐
    │  LLM Provider    │        │  0Latency API   │
    │  (Anthropic/     │        │  (Memory Store) │
    │   OpenAI/Gemini) │        └─────────────────┘
    └──────────────────┘

Performance

Zero added latency - Memory storage happens in background threads
Fast recall - Memory retrieval typically adds <100ms
Configurable budget - Control memory context size with budget_tokens
Smart caching - Frequently accessed memories are cached for speed

Requirements

Python 3.8+
anthropic>=0.18.0
openai>=1.0.0
google-generativeai>=0.3.0
requests>=2.25.0

License

MIT License - see LICENSE file for details

Support

Documentation: docs.0latency.ai
Issues: GitHub Issues
Email: support@0latency.ai

Contributing

Contributions are welcome! Please open an issue or PR on GitHub.

Built with ❤️ by the 0Latency team

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

May 10, 2026

0.2.0 yanked

Mar 30, 2026

Reason this release was yanked:

Broken /v1/ prefix — non-functional release, use 0.2.1

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zerolatency-0.2.1.tar.gz (13.0 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zerolatency-0.2.1-py3-none-any.whl (14.0 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file zerolatency-0.2.1.tar.gz.

File metadata

Download URL: zerolatency-0.2.1.tar.gz
Upload date: May 10, 2026
Size: 13.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zerolatency-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`d0c202c73c9aa5d24f6cf3e8ae9eb9c69f1f4ad382c83eb9403f6ceb152e4c7e`
MD5	`3b26138101a07db8c49a9db341214823`
BLAKE2b-256	`4520ffb56dc56bf209a3b46a99ef645319fdd097191855303c6c3142b28607b1`

See more details on using hashes here.

File details

Details for the file zerolatency-0.2.1-py3-none-any.whl.

File metadata

Download URL: zerolatency-0.2.1-py3-none-any.whl
Upload date: May 10, 2026
Size: 14.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zerolatency-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19b5f6e070d937325c324c6a0bd513e16c2840246e70ddeaa7e295c94d90190c`
MD5	`fe4f816ca3992832ac8e6bfeebaa3017`
BLAKE2b-256	`f931ffc6eaeaaad21ce5b8e73e5549f0282d5a249c88eb65ccfe71eeb3d02600`

See more details on using hashes here.

zerolatency 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ZeroLatency

Installation

Features

Quick Start

Anthropic Claude

OpenAI

Google Gemini

How It Works

Configuration

Get Your API Key

Examples

Multi-turn conversation with memory

Disable memory for specific calls

Architecture

Performance

Requirements

License

Support

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes