Universal LLM memory integration via LiteLLM - works with 100+ providers

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

hindsight-litellm

Universal LLM memory integration via LiteLLM. Add persistent memory to any LLM application with just a few lines of code.

Features

Universal LLM Support - Works with 100+ LLM providers via LiteLLM (OpenAI, Anthropic, Groq, Azure, AWS Bedrock, Google Vertex AI, and more)
Simple Integration - Just configure, enable, and use hindsight_litellm.completion()
Automatic Memory Injection - Relevant memories are injected into prompts before LLM calls
Automatic Conversation Storage - Conversations are stored to Hindsight for future recall
Two Memory Modes - Choose between reflect (synthesized context) or recall (raw memory retrieval)
Direct Memory APIs - Query, synthesize, and store memories manually
Native Client Wrappers - Alternative wrappers for OpenAI and Anthropic SDKs
Debug Mode - Inspect exactly what memories are being injected

Installation

pip install hindsight-litellm

Quick Start

import hindsight_litellm

# Configure and enable memory integration
hindsight_litellm.configure(
    hindsight_api_url="http://localhost:8888",
    bank_id="my-agent",
)
hindsight_litellm.enable()

# Use the convenience wrapper - memory is automatically injected and stored
response = hindsight_litellm.completion(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What did we discuss about AI?"}]
)

How It Works

Here's what happens under the hood when you call completion():

┌─────────────────────────────────────────────────────────────────────────────┐
│  1. YOUR CODE                                                               │
│  ───────────────────────────────────────────────────────────────────────── │
│  response = hindsight_litellm.completion(                                   │
│      model="gpt-4o-mini",                                                   │
│      messages=[{"role": "user", "content": "Help me with my Python project"}]│
│  )                                                                          │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  2. MEMORY RETRIEVAL (before LLM call)                                      │
│  ───────────────────────────────────────────────────────────────────────── │
│  # hindsight_litellm queries Hindsight for relevant memories                │
│                                                                             │
│  # If use_reflect=False (default) - raw memories:                           │
│  memories = hindsight.recall(query="Help me with my Python project")        │
│  # Returns: ["User prefers pytest", "User is building a FastAPI app", ...]  │
│                                                                             │
│  # If use_reflect=True - synthesized context:                               │
│  context = hindsight.reflect(query="Help me with my Python project")        │
│  # Returns: "The user is an experienced Python developer working on..."     │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  3. PROMPT INJECTION                                                        │
│  ───────────────────────────────────────────────────────────────────────── │
│  # Memories are injected into the system message:                           │
│                                                                             │
│  messages = [                                                               │
│      {"role": "system", "content": """                                      │
│          # Relevant Memories                                                │
│          1. [WORLD] User prefers pytest for testing                         │
│          2. [WORLD] User is building a FastAPI app                          │
│          3. [OPINION] User likes type hints                                 │
│      """},                                                                  │
│      {"role": "user", "content": "Help me with my Python project"}          │
│  ]                                                                          │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  4. LLM CALL                                                                │
│  ───────────────────────────────────────────────────────────────────────── │
│  # The enriched prompt is sent to the LLM                                   │
│  response = litellm.completion(model="gpt-4o-mini", messages=messages)      │
│                                                                             │
│  # LLM now has context and can give personalized responses like:            │
│  # "Since you're working on your FastAPI app, here's how to add tests       │
│  #  with pytest..."                                                         │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  5. CONVERSATION STORAGE (after LLM call)                                   │
│  ───────────────────────────────────────────────────────────────────────── │
│  # The conversation is stored to Hindsight for future recall                │
│  hindsight.retain(                                                          │
│      content="User: Help me with my Python project\n"                       │
│              "Assistant: Since you're working on FastAPI..."                │
│  )                                                                          │
│  # Hindsight extracts facts: "User asked about Python project help"         │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  6. RESPONSE RETURNED                                                       │
│  ───────────────────────────────────────────────────────────────────────── │
│  # You receive the response as normal                                       │
│  print(response.choices[0].message.content)                                 │
└─────────────────────────────────────────────────────────────────────────────┘

The memory injection and storage happen automatically - you just use completion() as normal.

Configuration Options

hindsight_litellm.configure(
    # Required
    hindsight_api_url="http://localhost:8888",  # Hindsight API server URL
    bank_id="my-agent",                          # Memory bank ID

    api_key="your-api-key",        # Optional API key for authentication

    # Optional - Memory behavior
    store_conversations=True,      # Store conversations after LLM calls
    inject_memories=True,          # Inject relevant memories into prompts
    use_reflect=False,             # Use reflect API (synthesized) vs recall (raw memories)
    reflect_include_facts=False,   # Include source facts with reflect responses
    max_memories=None,             # Maximum memories to inject (None = unlimited)
    max_memory_tokens=4096,        # Maximum tokens for memory context
    recall_budget="mid",           # Recall budget: "low", "mid", "high"
    fact_types=["world", "agent"], # Filter fact types to inject

    # Optional - Bank Configuration
    bank_name="My Agent",          # Human-readable display name for the memory bank
    background="This agent...",    # Instructions guiding what Hindsight should remember (see below)

    # Optional - Advanced
    injection_mode="system_message",  # or "prepend_user"
    excluded_models=["gpt-3.5*"],     # Exclude certain models
    verbose=True,                     # Enable verbose logging and debug info
)

Bank Configuration: background and bank_name

The background and bank_name parameters configure the memory bank itself. When provided, configure() will automatically create or update the bank with these settings.

bank_name: A human-readable display name for the memory bank. Useful for identifying banks in the Hindsight UI or when managing multiple banks.
background: Instructions that guide Hindsight on what information is important to extract and remember from conversations. This influences memory extraction during the retain operation and can affect how the bank's "disposition" (skepticism, literalism, empathy) is calibrated.

# Example: Customer support routing agent
hindsight_litellm.configure(
    hindsight_api_url="http://localhost:8888",
    bank_id="support-router",
    bank_name="Customer Support Router",
    background="""This agent routes customer support requests to the appropriate team.
    Remember which types of issues should go to which teams (billing, technical, sales).
    Track customer preferences for communication channels and past issue resolutions.
    Note any escalation patterns or VIP customers who need special handling.""",
)

Memory Modes: Reflect vs Recall

Recall mode (use_reflect=False, default): Retrieves raw memory facts and injects them as a numbered list. Best when you need precise, individual memories.
Reflect mode (use_reflect=True): Synthesizes memories into a coherent context paragraph. Best for natural, conversational memory context.

# Recall mode - raw memories
hindsight_litellm.configure(
    bank_id="my-agent",
    use_reflect=False,  # Default
)
# Injects: "1. [WORLD] User prefers Python\n2. [OPINION] User dislikes Java..."

# Reflect mode - synthesized context
hindsight_litellm.configure(
    bank_id="my-agent",
    use_reflect=True,
)
# Injects: "Based on previous conversations, the user is a Python developer who..."

Multi-Provider Support

Works with any LiteLLM-supported provider:

import hindsight_litellm

hindsight_litellm.configure(
    hindsight_api_url="http://localhost:8888",
    bank_id="my-agent",
)
hindsight_litellm.enable()

# OpenAI
hindsight_litellm.completion(model="gpt-4o", messages=[...])

# Anthropic
hindsight_litellm.completion(model="claude-3-5-sonnet-20241022", messages=[...])

# Groq
hindsight_litellm.completion(model="groq/llama-3.1-70b-versatile", messages=[...])

# Azure OpenAI
hindsight_litellm.completion(model="azure/gpt-4", messages=[...])

# AWS Bedrock
hindsight_litellm.completion(model="bedrock/anthropic.claude-3", messages=[...])

# Google Vertex AI
hindsight_litellm.completion(model="vertex_ai/gemini-pro", messages=[...])

Direct Memory APIs

Recall - Query raw memories

from hindsight_litellm import configure, recall

configure(bank_id="my-agent", hindsight_api_url="http://localhost:8888")

# Query memories
memories = recall("what projects am I working on?", budget="mid")
for m in memories:
    print(f"- [{m.fact_type}] {m.text}")

# Output:
# - [world] User is building a FastAPI project
# - [opinion] User prefers Python over JavaScript

Reflect - Get synthesized context

from hindsight_litellm import configure, reflect

configure(bank_id="my-agent", hindsight_api_url="http://localhost:8888")

# Get synthesized memory context
result = reflect("what do you know about the user's preferences?")
print(result.text)

# Output:
# "Based on our conversations, the user prefers Python for backend development..."

Retain - Store memories

from hindsight_litellm import configure, retain

configure(bank_id="my-agent", hindsight_api_url="http://localhost:8888")

# Store a memory
result = retain(
    content="User mentioned they're working on a machine learning project",
    context="Discussion about current projects",
)
print(f"Retained successfully: {result.success}, items: {result.items_count}")

Async APIs

from hindsight_litellm import arecall, areflect, aretain

# Async versions of all memory APIs
memories = await arecall("what do you know about me?")
context = await areflect("summarize user preferences")
result = await aretain(content="New information to remember")

Native Client Wrappers

Alternative to LiteLLM callbacks for direct SDK integration:

OpenAI Wrapper

from openai import OpenAI
from hindsight_litellm import wrap_openai

client = OpenAI()
wrapped = wrap_openai(
    client,
    bank_id="my-agent",
    hindsight_api_url="http://localhost:8888",
)

response = wrapped.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What do you know about me?"}]
)

Anthropic Wrapper

from anthropic import Anthropic
from hindsight_litellm import wrap_anthropic

client = Anthropic()
wrapped = wrap_anthropic(
    client,
    bank_id="my-agent",
    hindsight_api_url="http://localhost:8888",
)

response = wrapped.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Debug Mode

When verbose=True, you can inspect exactly what memories are being injected:

from hindsight_litellm import configure, enable, completion, get_last_injection_debug

configure(
    bank_id="my-agent",
    hindsight_api_url="http://localhost:8888",
    verbose=True,
    use_reflect=True,
)
enable()

response = completion(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What's my favorite color?"}]
)

# Inspect what was injected
debug = get_last_injection_debug()
if debug:
    print(f"Mode: {debug.mode}")           # "reflect" or "recall"
    print(f"Injected: {debug.injected}")   # True/False
    print(f"Results: {debug.results_count}")
    print(f"Memory context:\n{debug.memory_context}")
    if debug.error:
        print(f"Error: {debug.error}")

Context Manager

from hindsight_litellm import hindsight_memory
import litellm

with hindsight_memory(bank_id="user-123"):
    response = litellm.completion(model="gpt-4", messages=[...])
# Memory integration automatically disabled after context

Disabling and Cleanup

from hindsight_litellm import disable, cleanup

# Temporarily disable memory integration
disable()

# Clean up all resources (call when shutting down)
cleanup()

API Reference

Main Functions

Function	Description
`configure(...)`	Configure global Hindsight settings
`enable()`	Enable memory integration with LiteLLM
`disable()`	Disable memory integration
`is_enabled()`	Check if memory integration is enabled
`cleanup()`	Clean up all resources

Configuration Functions

Function	Description
`get_config()`	Get current configuration
`is_configured()`	Check if Hindsight is configured
`reset_config()`	Reset configuration to defaults

Memory Functions

Function	Description
`recall(query, ...)`	Synchronously query raw memories
`arecall(query, ...)`	Asynchronously query raw memories
`reflect(query, ...)`	Synchronously get synthesized memory context
`areflect(query, ...)`	Asynchronously get synthesized memory context
`retain(content, ...)`	Synchronously store a memory
`aretain(content, ...)`	Asynchronously store a memory

Debug Functions

Function	Description
`get_last_injection_debug()`	Get debug info from last memory injection
`clear_injection_debug()`	Clear stored debug info

Client Wrappers

Function	Description
`wrap_openai(client, ...)`	Wrap OpenAI client with memory
`wrap_anthropic(client, ...)`	Wrap Anthropic client with memory

Requirements

Python >= 3.10
litellm >= 1.40.0
A running Hindsight API server

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vectorize-io

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.3

May 15, 2026

0.5.2

Apr 24, 2026

0.5.1

Apr 15, 2026

0.5.0

Mar 21, 2026

0.4.19

Mar 18, 2026

0.4.18

Mar 13, 2026

0.4.17

Mar 10, 2026

0.4.16

Mar 5, 2026

0.4.15

Mar 3, 2026

0.4.14

Feb 26, 2026

0.4.13

Feb 19, 2026

0.4.12

Feb 18, 2026

0.4.11

Feb 13, 2026

0.4.10

Feb 9, 2026

0.4.9

Feb 4, 2026

0.4.8

Feb 3, 2026

0.4.7

Jan 31, 2026

0.4.6

Jan 30, 2026

0.4.5

Jan 30, 2026

0.4.4

Jan 30, 2026

0.4.3

Jan 30, 2026

0.4.2

Jan 29, 2026

0.4.1

Jan 29, 2026

0.4.0

Jan 28, 2026

0.3.0

Jan 13, 2026

0.2.1

Jan 5, 2026

0.2.0

Jan 5, 2026

0.1.16

Dec 23, 2025

0.1.15

Dec 23, 2025

0.1.14

Dec 23, 2025

0.1.13

Dec 22, 2025

0.1.12

Dec 22, 2025

0.1.11

Dec 18, 2025

0.1.10

Dec 18, 2025

0.1.9

Dec 18, 2025

0.1.8

Dec 17, 2025

0.1.7

Dec 16, 2025

This version

0.1.6

Dec 16, 2025

0.1.5

Dec 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hindsight_litellm-0.1.6.tar.gz (171.7 kB view details)

Uploaded Dec 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hindsight_litellm-0.1.6-py3-none-any.whl (25.8 kB view details)

Uploaded Dec 16, 2025 Python 3

File details

Details for the file hindsight_litellm-0.1.6.tar.gz.

File metadata

Download URL: hindsight_litellm-0.1.6.tar.gz
Upload date: Dec 16, 2025
Size: 171.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hindsight_litellm-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`2165941ea2049b2dc2661311a597493687b43435f651b5474afdf008af701e33`
MD5	`420eb3faa5c6299b0e2db85eaf5c065f`
BLAKE2b-256	`03d67fb3061754c0c8f4c8b91895e7df1f9a45903dbc7b9227af00dc44784990`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hindsight_litellm-0.1.6.tar.gz:

Publisher: release.yml on vectorize-io/hindsight

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hindsight_litellm-0.1.6.tar.gz
- Subject digest: 2165941ea2049b2dc2661311a597493687b43435f651b5474afdf008af701e33
- Sigstore transparency entry: 767299905
- Sigstore integration time: Dec 16, 2025
Source repository:
- Permalink: vectorize-io/hindsight@b36807ad3b3cb1e28db7f76d6b4f89da6e481587
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/vectorize-io
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b36807ad3b3cb1e28db7f76d6b4f89da6e481587
- Trigger Event: push

File details

Details for the file hindsight_litellm-0.1.6-py3-none-any.whl.

File metadata

Download URL: hindsight_litellm-0.1.6-py3-none-any.whl
Upload date: Dec 16, 2025
Size: 25.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hindsight_litellm-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a5a241a9101fb4822c8ad2ae6a41bdae9225b826ed5fd7b6d9c74d6830b1a9be`
MD5	`8244d53fdd3741704f59977e88f574ce`
BLAKE2b-256	`fe403381a7f725170aac183b251c746a7ce87a32982738141d3a4496857f5b42`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hindsight_litellm-0.1.6-py3-none-any.whl:

Publisher: release.yml on vectorize-io/hindsight

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hindsight_litellm-0.1.6-py3-none-any.whl
- Subject digest: a5a241a9101fb4822c8ad2ae6a41bdae9225b826ed5fd7b6d9c74d6830b1a9be
- Sigstore transparency entry: 767299906
- Sigstore integration time: Dec 16, 2025
Source repository:
- Permalink: vectorize-io/hindsight@b36807ad3b3cb1e28db7f76d6b4f89da6e481587
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/vectorize-io
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b36807ad3b3cb1e28db7f76d6b4f89da6e481587
- Trigger Event: push

hindsight-litellm 0.1.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

hindsight-litellm

Features

Installation

Quick Start

How It Works

Configuration Options

Bank Configuration: background and bank_name

Memory Modes: Reflect vs Recall

Multi-Provider Support

Direct Memory APIs

Recall - Query raw memories

Reflect - Get synthesized context

Retain - Store memories

Async APIs

Native Client Wrappers

OpenAI Wrapper

Anthropic Wrapper

Debug Mode

Context Manager

Disabling and Cleanup

API Reference

Main Functions

Configuration Functions

Memory Functions

Debug Functions

Client Wrappers

Requirements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance