Skip to main content

Short-term memory proxy gateway with proactive memory surfacing for AI agents

Project description

memtomem-stm

Official website & docs: https://memtomem.com

PyPI Python 3.12+ License: Apache 2.0 CLA

🚧 Alpha — APIs and defaults may change between 0.1.x releases. Feedback and issue reports are especially welcome: Issues · Discussions.

Spend fewer tokens. Remember more. Ship faster.

memtomem-stm is an MCP proxy that typically cuts token usage by 20–80% and gives your agent memory across sessions — with no changes to your upstream MCP servers.

It sits between your AI agent and its upstream MCP servers, compressing bloated tool responses, caching repeated calls, and automatically surfacing relevant context from prior sessions via a memtomem LTM server.

You need this if:

  • Your agent burns tokens re-reading the same files and search results — STM compresses and caches them (Claude Code, Cursor, Claude Desktop, or any MCP client)
  • Your coding sessions lose context and the agent re-discovers decisions it already made — STM surfaces prior context automatically via memtomem LTM
  • You run custom MCP servers and want compression, caching, and observability without changing upstream code — STM is a drop-in proxy layer
flowchart TB
    Agent["Agent<br/>(Claude Code, Cursor, …)"]
    subgraph STM["memtomem-stm (STM)"]
        Pipe["CLEAN → COMPRESS → SURFACE → INDEX"]
    end
    LTM[("memtomem LTM<br/>(MCP server)")]
    FS["filesystem<br/>MCP server"]
    GH["github<br/>MCP server"]
    Other["…any MCP server"]

    Agent -->|MCP| STM
    STM <-->|MCP: stdio / SSE / HTTP| FS
    STM <-->|MCP| GH
    STM <-->|MCP| Other
    STM <-.->|surfacing<br/>via MCP| LTM

Installation

pip install memtomem-stm

Or with uv:

uv tool install memtomem-stm     # install mms / memtomem-stm as global CLI tools
uvx memtomem-stm --help          # or run without installing
uv pip install memtomem-stm      # or install into the active environment

memtomem-stm is independent: it has no Python-level dependency on memtomem core. To enable proactive memory surfacing, point STM at a running memtomem MCP server (or any compatible MCP server) — communication happens entirely through the MCP protocol.

Quick Start

mms is the short alias for memtomem-stm-proxy — both commands are identical, use whichever you prefer.

1. Add an upstream MCP server

For first-time setup, run the guided wizard — it prompts for name/prefix/command, optionally probes the server, and prints the MCP-client snippet you'll need in step 2:

mms init

Or add servers non-interactively:

mms add filesystem \
  --command npx \
  --args "-y @modelcontextprotocol/server-filesystem /home/user/projects" \
  --prefix fs

--prefix is required: it's the namespace under which the upstream server's tools will appear (e.g. fs__read_file). Repeat for each MCP server you want to proxy.

mms list      # show what you've added
mms status    # show full config + connectivity

2. Connect your AI client to STM

Point your MCP client at the memtomem-stm server instead of the upstream servers directly. For Claude Code:

claude mcp add memtomem-stm -s user -- memtomem-stm

Or add it to a JSON MCP config:

{
  "mcpServers": {
    "memtomem-stm": {
      "command": "memtomem-stm"
    }
  }
}

3. Use the proxied tools

Your agent now sees proxied tools (fs__read_file, gh__search_repositories, etc.). Every call goes through the 4-stage pipeline automatically — responses are cleaned, compressed, cached, and (when an LTM server is configured) enriched with relevant memories.

To check what's happening, ask the agent to call stm_proxy_stats.

Tutorial notebooks

Try it without wiring into your AI client first. A quickstart Jupyter notebook registers an upstream MCP server, calls a proxied tool, and reads stm_proxy_stats end-to-end. Clone the repo, uv sync, and uv run jupyter lab notebooks/ — no external services needed.

Key Features

  • 🗜️ Typically 20–80% fewer tokens per tool call — 10 compression strategies with auto-selection by content type, query-aware budget, and zero-loss progressive delivery → docs/compression.md
  • 🧠 Your agent remembers — proactive memory surfacing from prior sessions, gated by relevance threshold, rate limit, dedup, and circuit breaker → docs/surfacing.md
  • 💾 Repeated calls are free — response cache with TTL and eviction; surfacing re-applied on cache hit so injected memories stay fresh → docs/caching.md
  • 🛡️ Production-safe — circuit breaker, retry with backoff, write-tool skip, query cooldown, dedup, sensitive content auto-detection, Langfuse tracing, horizontal scaling via PendingStore

Documentation

Guide Topic
Surfacing How agents recall prior context automatically
Compression All 10 strategies — pick the right one for your content
Caching Skip repeated work with response caching
Configuration Tune settings without touching code
CLI CLI commands and the 10 MCP tools

Development

uv sync                                                    # install dev deps
uv run pytest -m "not ollama and not bench_qa_meta and not bench_qa_llm_judge"   # tests (CI filter)
uv run ruff check src && uv run ruff format --check src    # lint (required)
uv run mypy src                                            # typecheck (advisory)

CI runs the same commands on every PR via .github/workflows/ci.yml. Lint (ruff check + ruff format --check) and tests must pass; mypy is advisory.

License

Apache License 2.0. Contributions are accepted under the terms of the Contributor License Agreement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memtomem_stm-0.1.11.tar.gz (578.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memtomem_stm-0.1.11-py3-none-any.whl (138.6 kB view details)

Uploaded Python 3

File details

Details for the file memtomem_stm-0.1.11.tar.gz.

File metadata

  • Download URL: memtomem_stm-0.1.11.tar.gz
  • Upload date:
  • Size: 578.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memtomem_stm-0.1.11.tar.gz
Algorithm Hash digest
SHA256 f75291ff495a7d4c79f52598f88d82ba05f5c951d504318a3085381d7d6995a2
MD5 09cbccb6a8762773b78c70f1dce8ce9d
BLAKE2b-256 18e03d4866a500ad1747ae4b602c5bccd800ded1546a0afa2d31146aeb5e06aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for memtomem_stm-0.1.11.tar.gz:

Publisher: release.yml on memtomem/memtomem-stm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file memtomem_stm-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: memtomem_stm-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 138.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memtomem_stm-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 e956a1dea31b09dd260742206a8291bef21301a932b7810169966002d90955d3
MD5 7f31def349f668dbcacaebd3d2cf5f03
BLAKE2b-256 a94d9f2c546a97dad25df7f569f52fa604a00dc11dc13713da6cd12367469ff9

See more details on using hashes here.

Provenance

The following attestation bundles were made for memtomem_stm-0.1.11-py3-none-any.whl:

Publisher: release.yml on memtomem/memtomem-stm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page