Skip to main content

Continuously learning memory layer for LLM applications: signals, stacks, decay, two-tier retrieval.

Project description

Street AI

Continuously learning memory layer for LLM applications. Your AI's memory grows forever. Your token bill doesn't.

Street AI sits between your application and the LLM API. It stores conversation as signals organized into stacks, decays old data automatically, and retrieves only what's relevant on each turn — so you send a tiny prompt instead of the full conversation history.

In our 16-turn benchmark, input tokens dropped by 55–80% per turn (average 68%), with the savings growing as the conversation lengthens.

Status

Alpha (0.2.0). API will change. Pin a version if you depend on it.

Install

pip install streetai-memory

The PyPI name is streetai-memory; the import path is streetai:

from streetai import Memory, MemoryRegistry, Config

First use downloads a ~25MB embedding model (all-MiniLM-L6-v2) into a local cache.

To install with provider adapters:

pip install "streetai-memory[anthropic]"  # Anthropic
pip install "streetai-memory[openai]"     # OpenAI (also DeepSeek, Together, Groq)
pip install "streetai-memory[gemini]"     # Google Gemini
pip install "streetai-memory[all]"        # all of the above

Quickstart

from streetai import MemoryRegistry

registry = MemoryRegistry("./memory.db")
mem = registry.get("user_123")

mem.add_message("Hi, I'm planning a trip to Japan.", role="user")
mem.add_message("Great! Which cities?", role="assistant")

prompt = mem.build_prompt("What did I say about Japan?")

# prompt.messages   -> list of {role, content} ready for any LLM API
# prompt.retrieved  -> signals that were pulled in (pass to post_process)
# prompt.inspector  -> debug info (stacks activated, scores, etc.)

# After your LLM responds:
# response_text = your_llm(messages=prompt.messages)
# mem.post_process(prompt.retrieved, response_text)
# mem.add_message("What did I say about Japan?", role="user")
# mem.add_message(response_text, role="assistant")

For a fully runnable version, see examples/quickstart.py.

Memory IDs and persistence. Each memory_id is a separate, persistent memory. Use one per user or session (they never leak into each other). Memory is saved to the SQLite file you pass (./memory.db above) and reloads automatically on the next run, so it survives restarts and across processes.

To wipe one memory (for example on a "clear chat" action or account deletion):

registry.reset("user_123")

Editing or deleting a specific past message. Each call to add_message (or each provider call through an adapter) returns the created signals; each signal carries a parent_turn_id that identifies the whole message. Store that id alongside the message in your display DB, then use it when a user edits or deletes:

created = mem.add_message("I am vegetarian", role="user")
pid = created[0].parent_turn_id

mem.update_message(pid, "I am vegan")   # in-place edit; keeps the same turn
mem.delete_message(pid)                  # remove the message entirely

Drop-in adapters

The adapters wrap a real provider client. You use the same SDK API you already know; memory is read and written transparently on every call.

Anthropic

from anthropic import Anthropic
from streetai.adapters.anthropic import with_memory

client = with_memory(Anthropic(), memory_id="user_123")

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are helpful.",
    messages=[{"role": "user", "content": "What did I mention earlier?"}],
)
print(response.content[0].text)

Full example: examples/anthropic_chat.py.

OpenAI

from openai import OpenAI
from streetai.adapters.openai import with_memory

client = with_memory(OpenAI(), memory_id="user_123")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What did I mention earlier?"}],
)
print(response.choices[0].message.content)

Full example: examples/openai_chat.py.

DeepSeek (uses the OpenAI adapter)

DeepSeek is OpenAI-API-compatible. Use the OpenAI adapter with base_url:

import os
from openai import OpenAI
from streetai.adapters.openai import with_memory

deepseek = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com/v1",
)
client = with_memory(deepseek, memory_id="user_123")

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "What did I mention earlier?"}],
)

The same pattern works for Together, Anyscale, Groq, and any other OpenAI-compatible endpoint. Full example: examples/deepseek_chat.py.

Google Gemini

from google import genai
from streetai.adapters.gemini import with_memory

client = with_memory(genai.Client(api_key="..."), memory_id="user_123")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="What did I mention earlier?",
)
print(response.text)

Full example: examples/gemini_chat.py.

Async

Every adapter has an async equivalent. Pass an async client; use await. Memory operations run in a worker thread under the hood so the event loop never blocks.

# Anthropic
from anthropic import AsyncAnthropic
from streetai.adapters.anthropic import with_memory_async
client = with_memory_async(AsyncAnthropic(), memory_id="user_123")
response = await client.messages.create(model="claude-sonnet-4-6",
    max_tokens=1024, messages=[{"role": "user", "content": "..."}])

# OpenAI (and DeepSeek via base_url)
from openai import AsyncOpenAI
from streetai.adapters.openai import with_memory_async
client = with_memory_async(AsyncOpenAI(), memory_id="user_123")
response = await client.chat.completions.create(model="gpt-4o-mini",
    messages=[{"role": "user", "content": "..."}])

# Gemini (uses the existing client's .aio namespace internally)
from google import genai
from streetai.adapters.gemini import with_memory_async
client = with_memory_async(genai.Client(api_key="..."), memory_id="user_123")
response = await client.models.generate_content(model="gemini-2.0-flash",
    contents="...")

How it works

your message
     |
     v
[1] split into chunks (sentence-sized signals)
     |
     v
[2] embed each chunk to a 384-dim vector
     |
     v
[3] assign to a stack (cluster of related signals) by cosine similarity
     |
     v
[4] when a new query arrives:
       - find top-K most relevant stacks (FAISS)
       - within those stacks, surface signals that pass the activation threshold
       - faded signals stay out — unless they're a strong match, which revives them
     |
     v
[5] build a small prompt:
       [retrieved context] + [last N messages verbatim] + [new query]
     |
     v
[6] after the LLM responds:
       - boost signals that matched the response (they helped)
       - demote signals that didn't (they were noise)
       - decay continues until the signal is used again

Decay is measured in interactions, not wall-clock time, so memory survives long idle periods (a user returning weeks later finds it where they left it). Signals refresh their clock when retrieved; frequently useful data stays sharp, unused data fades.

Compared to plain chat history

Plain chat history Street AI
Prompt grows with conversation Yes (linear) No (near flat)
Recent context kept verbatim Yes Yes (recency window)
Activity-aware (decay) No Yes (per interaction)
Learns from outcomes No Yes (boost/demote)
Self-organizing No Yes (auto-stacks)
Cross-provider Yes Yes

Configuration

Override defaults with Config:

import math
from streetai import MemoryRegistry, Config

cfg = Config(
    recency_turns=5,             # last 5 messages verbatim (default 3)
    decay_rate=math.log(2)/100,  # 100-interaction half-life (default 50; decay is per-turn)
    stack_threshold=0.65,        # tighter stack assignment (default 0.55)
    activation_threshold=0.1,    # min score for a signal to surface (default 0.15)
    revival_similarity=0.45,     # a faded signal revives on a match this strong (0 disables)
)

registry = MemoryRegistry("./memory.db", config=cfg)

All tunables: see streetai/config.py.

Limitations (v0.2)

  • Non-streaming only. stream=True raises NotImplementedError.
  • English-tuned defaults. Chunking and thresholds may need tuning for other languages.
  • fastembed is required. Pluggable encoders come in a future version.

Development

git clone https://github.com/Tem-Degu/streetai-memory.git
cd streetai-memory
pip install -e ".[dev]"
pytest

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streetai_memory-0.2.0.tar.gz (35.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streetai_memory-0.2.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file streetai_memory-0.2.0.tar.gz.

File metadata

  • Download URL: streetai_memory-0.2.0.tar.gz
  • Upload date:
  • Size: 35.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for streetai_memory-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f35a7ad8948a19ec1682c48ac67f663416c5b5bc4205d9f7643312feddc6214d
MD5 4b7fd78a5a81e452c4c5a8d3eea6d14e
BLAKE2b-256 654daf570b5d53a3f151d109ca5f8037f42a6dfb5739d0f3e81b95f775d6ec45

See more details on using hashes here.

Provenance

The following attestation bundles were made for streetai_memory-0.2.0.tar.gz:

Publisher: publish.yml on Tem-Degu/streetai-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file streetai_memory-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: streetai_memory-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for streetai_memory-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7e01fcec4deaa182e698514c99d40a8400377eec0b904cf528dfcb01509424d
MD5 aa2b218915cb6215bae8f061826760b3
BLAKE2b-256 f2d0c2982daf62c948f1bd2bdbebb42a4232f5101c6ad37c24703d4ac7dc02a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for streetai_memory-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Tem-Degu/streetai-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page