Skip to main content

Short-term memory proxy gateway with proactive memory surfacing for AI agents

Project description

memtomem-stm

PyPI Python 3.12+ License: Apache 2.0 CLA

Short-term memory proxy gateway with proactive memory surfacing for AI agents.

Sits between your AI agent and upstream MCP servers. Compresses responses to save tokens, caches results, and automatically surfaces relevant memories from a memtomem LTM server.

Built for:

  • Agents (Claude Code, Cursor, Claude Desktop, etc.) running multiple MCP servers and burning tokens on noisy upstream responses
  • Long-running coding sessions where the agent should recall prior decisions instead of re-searching
  • Teams running custom MCP servers that need a proxy layer for compression, caching, and observability — no upstream code changes required
flowchart TB
    Agent["Agent<br/>(Claude Code, Cursor, …)"]
    subgraph STM["memtomem-stm (STM)"]
        Pipe["CLEAN → COMPRESS → SURFACE → INDEX"]
    end
    LTM[("memtomem LTM<br/>(MCP server)")]
    FS["filesystem<br/>MCP server"]
    GH["github<br/>MCP server"]
    Other["…any MCP server"]

    Agent -->|MCP| STM
    STM <-->|MCP: stdio / SSE / HTTP| FS
    STM <-->|MCP| GH
    STM <-->|MCP| Other
    STM <-.->|surfacing<br/>via MCP| LTM

Installation

pip install memtomem-stm

Or with uv:

uv tool install memtomem-stm     # install mms / memtomem-stm as global CLI tools
uvx memtomem-stm --help          # or run without installing
uv pip install memtomem-stm      # or install into the active environment

memtomem-stm is independent: it has no Python-level dependency on memtomem core. To enable proactive memory surfacing, point STM at a running memtomem MCP server (or any compatible MCP server) — communication happens entirely through the MCP protocol.

Quick Start

mms is the short alias for memtomem-stm-proxy — both commands are identical, use whichever you prefer.

1. Add an upstream MCP server

mms add filesystem \
  --command npx \
  --args "-y @modelcontextprotocol/server-filesystem /home/user/projects" \
  --prefix fs

--prefix is required: it's the namespace under which the upstream server's tools will appear (e.g. fs__read_file). Repeat for each MCP server you want to proxy.

mms list      # show what you've added
mms status    # show full config + connectivity

2. Connect your AI client to STM

Point your MCP client at the memtomem-stm server instead of the upstream servers directly. For Claude Code:

claude mcp add memtomem-stm -s user -- memtomem-stm

Or add it to a JSON MCP config:

{
  "mcpServers": {
    "memtomem-stm": {
      "command": "memtomem-stm"
    }
  }
}

3. Use the proxied tools

Your agent now sees proxied tools (fs__read_file, gh__search_repositories, etc.). Every call goes through the 4-stage pipeline automatically — responses are cleaned, compressed, cached, and (when an LTM server is configured) enriched with relevant memories.

To check what's happening, ask the agent to call stm_proxy_stats.

Tutorial notebooks

Want to see STM's behavior without wiring it into Claude Code first? The notebooks/ directory contains six runnable Jupyter notebooks: a CLI-MCP prelude (00), quickstart setup (01), selective compression (02), memory surfacing (03), a LangChain agent integration (04), and observability/Langfuse tracing (05). Clone the repo, run uv sync, and uv run jupyter lab notebooks/ — no external services required for notebooks 00–03 and 05.

Key Features

  • 🗜️ 10 compression strategies with auto-selection by content type, query-aware budget allocation, and zero-loss progressive delivery → docs/compression.md
  • 🧠 Proactive memory surfacing from a memtomem LTM server, gated by relevance threshold, rate limit, dedup, and circuit breaker → docs/surfacing.md
  • 💾 Response caching with TTL and eviction; surfacing re-applied on cache hit so injected memories stay fresh → docs/caching.md
  • 🔍 Observability — Langfuse tracing, RPS, latency percentiles (p50/p95/p99), error classification, per-tool metrics → docs/operations.md#observability
  • 📈 Horizontal scalingPendingStore protocol with InMemory (default) or SQLite-shared backend for multi-instance deployments → docs/operations.md#horizontal-scaling
  • 🛡️ Safety — circuit breaker, retry with backoff, write-tool skip, query cooldown, session/cross-session dedup, sensitive content auto-detection → docs/operations.md#safety--resilience

Documentation

Guide Topic
Pipeline The 4-stage CLEAN → COMPRESS → SURFACE → INDEX flow
Compression All 10 strategies, query-aware compression, progressive delivery, model-aware defaults
Surfacing Memory surfacing engine, relevance gating, feedback loop, auto-tuning
Caching Response cache and auto-indexing
Configuration Environment variables and stm_proxy.json reference
CLI mms (= memtomem-stm-proxy) commands and the 10 MCP tools
Operations Safety, privacy, horizontal scaling, observability, on-disk state

Development

uv sync                                                    # install dev deps
uv run pytest -m "not ollama"                              # tests (CI filter)
uv run ruff check src && uv run ruff format --check src    # lint (required)
uv run mypy src                                            # typecheck (advisory)

CI runs the same commands on every PR via .github/workflows/ci.yml. Lint (ruff check + ruff format --check) and tests must pass; mypy is advisory.

License

Apache License 2.0. Contributions are accepted under the terms of the Contributor License Agreement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memtomem_stm-0.1.6.tar.gz (456.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memtomem_stm-0.1.6-py3-none-any.whl (102.4 kB view details)

Uploaded Python 3

File details

Details for the file memtomem_stm-0.1.6.tar.gz.

File metadata

  • Download URL: memtomem_stm-0.1.6.tar.gz
  • Upload date:
  • Size: 456.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memtomem_stm-0.1.6.tar.gz
Algorithm Hash digest
SHA256 994e368a3941cbae6240dbeb88468c18f5c70fad45db4591b4750c693708f86a
MD5 ffa1485897fd1781d96b9fbc7f598f33
BLAKE2b-256 e4d36f4f45136380eb9d510a1ee62c77aebbadf9d6515521eb5055caf05012a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for memtomem_stm-0.1.6.tar.gz:

Publisher: release.yml on memtomem/memtomem-stm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file memtomem_stm-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: memtomem_stm-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 102.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memtomem_stm-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6f28620f71919a9091cb6a51f59f0bda01f53a92da66e3397f4ea4bec9b707a7
MD5 8b6e1d2ff76485c026e0428a58296908
BLAKE2b-256 36b6411d19ae0b6189ac3f0358057f0263e0b1459975224eb7efebab21db350d

See more details on using hashes here.

Provenance

The following attestation bundles were made for memtomem_stm-0.1.6-py3-none-any.whl:

Publisher: release.yml on memtomem/memtomem-stm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page