Short-term memory proxy gateway with proactive memory surfacing for AI agents
Project description
memtomem-stm
Short-term memory proxy gateway with proactive memory surfacing for AI agents.
Sits between your AI agent and upstream MCP servers. Compresses responses to save tokens, caches results, and automatically surfaces relevant memories from a memtomem LTM server.
Built for:
- Agents (Claude Code, Cursor, Claude Desktop, etc.) running multiple MCP servers and burning tokens on noisy upstream responses
- Long-running coding sessions where the agent should recall prior decisions instead of re-searching
- Teams running custom MCP servers that need a proxy layer for compression, caching, and observability — no upstream code changes required
flowchart TB
Agent["Agent<br/>(Claude Code, Cursor, …)"]
subgraph STM["memtomem-stm (STM)"]
Pipe["CLEAN → COMPRESS → SURFACE → INDEX"]
end
LTM[("memtomem LTM<br/>(MCP server)")]
FS["filesystem<br/>MCP server"]
GH["github<br/>MCP server"]
Other["…any MCP server"]
Agent -->|MCP| STM
STM <-->|MCP: stdio / SSE / HTTP| FS
STM <-->|MCP| GH
STM <-->|MCP| Other
STM <-.->|surfacing<br/>via MCP| LTM
Installation
pip install memtomem-stm
Or with uv:
uv tool install memtomem-stm # install mms / memtomem-stm as global CLI tools
uvx memtomem-stm --help # or run without installing
uv pip install memtomem-stm # or install into the active environment
memtomem-stm is independent: it has no Python-level dependency on memtomem core. To enable proactive memory surfacing, point STM at a running memtomem MCP server (or any compatible MCP server) — communication happens entirely through the MCP protocol.
Quick Start
mms is the short alias for memtomem-stm-proxy — both commands are identical, use whichever you prefer.
1. Add an upstream MCP server
mms add filesystem \
--command npx \
--args "-y @modelcontextprotocol/server-filesystem /home/user/projects" \
--prefix fs
--prefix is required: it's the namespace under which the upstream server's tools will appear (e.g. fs__read_file). Repeat for each MCP server you want to proxy.
mms list # show what you've added
mms status # show full config + connectivity
2. Connect your AI client to STM
Point your MCP client at the memtomem-stm server instead of the upstream servers directly. For Claude Code:
claude mcp add memtomem-stm -s user -- memtomem-stm
Or add it to a JSON MCP config:
{
"mcpServers": {
"memtomem-stm": {
"command": "memtomem-stm"
}
}
}
3. Use the proxied tools
Your agent now sees proxied tools (fs__read_file, gh__search_repositories, etc.). Every call goes through the 4-stage pipeline automatically — responses are cleaned, compressed, cached, and (when an LTM server is configured) enriched with relevant memories.
To check what's happening, ask the agent to call stm_proxy_stats.
Tutorial notebooks
Want to see STM's behavior without wiring it into Claude Code first? The notebooks/ directory contains four runnable Jupyter notebooks covering quickstart setup, selective compression, memory surfacing, and a LangChain create_agent integration. Clone the repo, run uv sync, and uv run jupyter lab notebooks/ — no external services required for the first three.
Key Features
- 🗜️ 10 compression strategies with auto-selection by content type, query-aware budget allocation, and zero-loss progressive delivery → docs/compression.md
- 🧠 Proactive memory surfacing from a memtomem LTM server, gated by relevance threshold, rate limit, dedup, and circuit breaker → docs/surfacing.md
- 💾 Response caching with TTL and eviction; surfacing re-applied on cache hit so injected memories stay fresh → docs/caching.md
- 🔍 Observability — Langfuse tracing, RPS, latency percentiles (p50/p95/p99), error classification, per-tool metrics → docs/operations.md#observability
- 📈 Horizontal scaling —
PendingStoreprotocol with InMemory (default) or SQLite-shared backend for multi-instance deployments → docs/operations.md#horizontal-scaling - 🛡️ Safety — circuit breaker, retry with backoff, write-tool skip, query cooldown, session/cross-session dedup, sensitive content auto-detection → docs/operations.md#safety--resilience
Documentation
| Guide | Topic |
|---|---|
| Pipeline | The 4-stage CLEAN → COMPRESS → SURFACE → INDEX flow |
| Compression | All 10 strategies, query-aware compression, progressive delivery, model-aware defaults |
| Surfacing | Memory surfacing engine, relevance gating, feedback loop, auto-tuning |
| Caching | Response cache and auto-indexing |
| Configuration | Environment variables and stm_proxy.json reference |
| CLI | mms (= memtomem-stm-proxy) commands and the 10 MCP tools |
| Operations | Safety, privacy, horizontal scaling, observability, on-disk state |
Development
uv sync # install dev deps
uv run pytest -m "not ollama" # tests (CI filter)
uv run ruff check src && uv run mypy src # lint + typecheck
CI runs the same three commands on every PR via .github/workflows/ci.yml. Mypy is advisory; lint and tests are required to pass.
License
Apache License 2.0. Contributions are accepted under the terms of the Contributor License Agreement.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memtomem_stm-0.1.4.tar.gz.
File metadata
- Download URL: memtomem_stm-0.1.4.tar.gz
- Upload date:
- Size: 438.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5e5f1f057eea7c35263a15477e4540b11bf3ea9ac435206b110eb48c92d56e2
|
|
| MD5 |
ccfcbf6f26ff0edb239e198ddf11da96
|
|
| BLAKE2b-256 |
9c853c33286a98ee6b9a680c23d30361e8ac4e2c5acbcabb5f198e015fc8f893
|
Provenance
The following attestation bundles were made for memtomem_stm-0.1.4.tar.gz:
Publisher:
release.yml on memtomem/memtomem-stm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
memtomem_stm-0.1.4.tar.gz -
Subject digest:
e5e5f1f057eea7c35263a15477e4540b11bf3ea9ac435206b110eb48c92d56e2 - Sigstore transparency entry: 1280695222
- Sigstore integration time:
-
Permalink:
memtomem/memtomem-stm@6325ba0b824422e1365829939dc9b8b66aabd455 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/memtomem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6325ba0b824422e1365829939dc9b8b66aabd455 -
Trigger Event:
push
-
Statement type:
File details
Details for the file memtomem_stm-0.1.4-py3-none-any.whl.
File metadata
- Download URL: memtomem_stm-0.1.4-py3-none-any.whl
- Upload date:
- Size: 99.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
348b12af687db9de192d6c7e4eea36b37996a17c162668bf8ff1e42535c9534e
|
|
| MD5 |
c5a01c50d8efd3669dd6c9fae62d5f25
|
|
| BLAKE2b-256 |
74a4a77154bed652bf0e62ac397474786fb15d3cb8c1d1a4afa85c81711084bf
|
Provenance
The following attestation bundles were made for memtomem_stm-0.1.4-py3-none-any.whl:
Publisher:
release.yml on memtomem/memtomem-stm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
memtomem_stm-0.1.4-py3-none-any.whl -
Subject digest:
348b12af687db9de192d6c7e4eea36b37996a17c162668bf8ff1e42535c9534e - Sigstore transparency entry: 1280695223
- Sigstore integration time:
-
Permalink:
memtomem/memtomem-stm@6325ba0b824422e1365829939dc9b8b66aabd455 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/memtomem
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6325ba0b824422e1365829939dc9b8b66aabd455 -
Trigger Event:
push
-
Statement type: