Project-agnostic dual-memory tooling for Claude Code, Cursor, and opencode
Project description
๐ง supamem
Qdrant-backed dual-memory for AI coding agents
Give Claude Code, Cursor, and OpenCode persistent semantic + structural memory across every project.
๐ Built by Dzmitry Sukhau โ AI-native Solution / Software Architect / CTO
Available for consulting on AI products, integrating AI into existing products, and business-process automation.
If you're shipping LLM features, evaluating retrieval pipelines, hardening agentic systems, or building an AI-first product from scratch โ let's talk.
โจ What is supamem?
supamem is a single-binary CLI that wires up a production-grade memory layer for any AI coding
assistant. Drop it into a fresh repo, run supamem init, and your agents instantly gain:
- ๐ Semantic search over project notes, ADRs, decisions, and past conversations (hybrid sparse+dense retrieval)
- ๐ค MCP server that any compatible client (Claude Code, Cursor, OpenCode) can talk to
- ๐ช Per-client hooks that auto-load relevant memory at session start and on file edits
- ๐ Welford usage stats so you can see what memory is actually being recalled
- ๐งช Eval harness with a 33-query golden corpus to detect retrieval regressions
Battle-tested inside SoftChat (Phases 80.1โ80.5) before being extracted into a standalone package every team can adopt.
๐ฏ Why supamem exists
The problem: Coding agents have no memory between sessions. Every time you open a new conversation in Claude Code / Cursor / OpenCode, the model has zero context about your codebase, past decisions, ADRs, known issues, or conventions. So either:
- You re-paste 5โ15 KB of context at the start of every session (slow, error-prone, costly), or
- You let the agent flounder โ it grep-walks the repo, asks redundant questions, forgets last week's decisions, and rediscovers the same gotchas you already documented six months ago.
The fix: A persistent semantic + structural memory layer that automatically retrieves the right 1โ2 KB of context for the current prompt โ no manual pasting, no re-explaining, no context blow-out.
Phase 80.1 bench (33 labeled goldens, real Claude Code sessions): โ78.5% tokens vs naive whole-doc retrieval at the same recall, p95 73 ms end-to-end.
The full evaluation is the same one we ran inside SoftChat to lock the production pipeline. Methodology: 33 representative dev queries โ 4 retrieval arms compared (baseline_union, tuned_current, tuned_hybrid, mem0_vector) โ token count + recall CI + latency measured per arm.
๐ Token consumption: agent with memory vs without
Numbers below are per typical 30-turn Claude Code session assuming a real codebase with ~50 ADRs / insights / rules (โ what SoftChat ships). YMMV โ but the ratio between arms holds.
| Approach | Tokens/turn | Tokens/30-turn session | Notes |
|---|---|---|---|
| โ No memory layer | โ 0 auto-injected, but you paste context manually | 30,000โ80,000 (manual paste, repeated) | You spend cognitive load on copying instead of building |
| โ ๏ธ Naive RAG (whole-doc embed) | ~5,800 / turn | ~174,000 | Bloated, recalls big files when you only needed a paragraph |
โ
supamem tuned_hybrid |
~1,250 / turn | ~37,500 | Same recall, โ78.5% tokens vs naive RAG |
๐ฐ Approximate inference cost savings
Anthropic API list pricing (Mar 2026): Sonnet 4.6 = $3 / Mtok input ยท Opus 4.7 = $15 / Mtok input.
| Model | Tokens saved/session vs naive RAG | Cost saved/session | Monthly (110 sessions) |
|---|---|---|---|
| Sonnet 4.6 | 136,500 | $0.41 | ~$45/dev |
| Opus 4.7 | 136,500 | $2.05 | ~$225/dev |
A 10-engineer team running Opus saves ~$2,250/month on input tokens alone โ without counting the cost of slower iteration, lost decisions, and time spent re-pasting context. Output token savings (less hallucination, fewer back-and-forth turns) compound on top.
๐ฅ vs the alternatives
| No memory | Naive RAG | mem0 / atomic facts | supamem (tuned_hybrid) | |
|---|---|---|---|---|
| Auto-inject on session start | โ | โ ๏ธ | โ | โ |
| Hybrid sparse+dense retrieval | โ | โ | โ | โ |
| Code-identifier preservation | โ | โ | โ (drops names) | โ |
| Locked schema + golden eval | โ | โ | โ | โ |
| Multi-client (Claude/Cursor/OpenCode) | โ | โ | โ ๏ธ | โ |
| p95 latency | n/a | ~120 ms | ~80 ms | 73 ms |
| Token bloat | High (manual) | Highest | Low but lossy | Lowest with full recall |
Why hybrid? BM25 catches exact identifiers (ChatService.generate, env-var names,
file paths) that dense embeddings smear. Dense catches semantic intent ("how do we
handle billing webhooks?") that BM25 misses. RRF fusion combines both rankings so you
get the best of each.
Why not mem0? mem0's atomic-fact extraction loses code identifiers โ recall on the 33-query bench was 0.015 (effectively zero). Great for personal CRM-style memory, not for code-aware retrieval.
โก๏ธ 60-second quickstart
# 1. Install (uv is the fastest path)
uv tool install supamem
# 2. Start Qdrant (one-time, ~30s)
docker run -d -p 6333:6333 -p 6334:6334 -v $HOME/.qdrant:/qdrant/storage qdrant/qdrant:latest
# 3. Bootstrap your project
cd your-project
supamem init
# 4. Wire it into your AI client
supamem install --client claude-code # or cursor, opencode
# 5. Confirm everything is healthy
supamem doctor
That's it. Open Claude Code (or your preferred client) inside the project โ the memory tool is already on the menu. โจ
๐ Features
| Feature | Description |
|---|---|
| ๐ Hybrid retrieval | Tuned sparse (BM25) + dense (MiniLM) fusion, locked schema D-25 |
| ๐ Markdown chunker | Header-aware, 200-token chunks with 250-token soft max (T-1) |
| ๐ค MCP server | stdio (default) and http transports, official mcp SDK |
| ๐ช Multi-client hooks | Claude Code session-start, OpenCode session-start, Cursor MDC |
| ๐งฐ One-command install | Atomic config patching with auto-backup and rollback |
๐ฉบ supamem doctor |
Probe Qdrant, resolve config chain, surface version drift |
| ๐ Welford counters | Track recall rate, latency, query volume per project |
| ๐งช Eval harness | 33-query golden corpus + regression detection |
| ๐ Brownfield migration | Detect existing dev_memory and migrate non-destructively |
| ๐จ Stylish CLI | Rich-powered spinners, panels, and color so you always see progress |
๐ Prerequisites
You only really need two things: Python 3.12+ and Qdrant. Everything else is optional.
๐ Python 3.12+ ย ยทย click to expand install commands
# macOS (Homebrew)
brew install python@3.12
# Linux (Ubuntu/Debian)
sudo apt install python3.12 python3.12-venv
# Windows (PowerShell)
winget install Python.Python.3.12
We strongly recommend installing uv โ the fastest Python package manager:
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
๐๏ธ Qdrant 1.10+ ย ยทย vector database (required)
The simplest path is Docker:
docker run -d --name qdrant \
-p 6333:6333 -p 6334:6334 \
-v $HOME/.qdrant:/qdrant/storage \
qdrant/qdrant:latest
Or with docker compose:
services:
qdrant:
image: qdrant/qdrant:latest
ports: ["6333:6333", "6334:6334"]
volumes: ["./qdrant_data:/qdrant/storage"]
restart: unless-stopped
Don't have Docker? Run a managed cluster on Qdrant Cloud (free tier
available) and point supamem at the URL via supamem init.
๐ค An MCP-compatible client ย ยทย pick at least one
| Client | Install | Notes |
|---|---|---|
| Claude Code | npm install -g @anthropic-ai/claude-code |
First-class MCP support |
| Cursor | Download from cursor.com | Uses MDC rules + MCP |
| OpenCode | curl -fsSL https://opencode.ai/install | bash |
Open-source TUI, MCP native |
๐ฆ Install
# Recommended: uv (fastest, isolated)
uv tool install supamem
# Alternative: pipx (also isolated)
pipx install supamem
# Plain pip (in a venv)
pip install supamem
Verify:
supamem --version
You should see a colorful banner and the credit line. ๐จ
Note:
v0.1.0ships as a git-tag release per D-44. PyPI publish lands inv0.2. To install the pre-release directly from git:uv tool install git+https://github.com/dzmitrys-dev/supamem@v0.1.0
๐ฏ CLI surface
| Command | Purpose |
|---|---|
supamem init |
Greenfield bootstrap โ probes Qdrant, creates collection, writes .supamem/config.toml |
supamem install --client <name> |
Patch a client config (claude-code, cursor, opencode) โ atomic with backup |
supamem index |
Embed dev memories into Qdrant using the locked tuned-hybrid pipeline (D-25) |
supamem mcp-server |
Run the MCP server (--transport stdio default; --transport http for HTTP) |
supamem hook <client> |
Per-client session/edit hooks (called by the client itself) |
supamem doctor |
๐ฉบ Probe Qdrant, print resolved config chain, report version drift |
supamem stats |
Welford schema-v2 usage counters from .supamem/state/ |
supamem migrate |
Brownfield migration from a pre-existing dev_memory collection |
supamem eval |
Run the regression harness against the bundled 33-query golden corpus |
supamem uninstall --client <name> |
Reverse supamem install cleanly |
Every long-running command shows a live spinner with elapsed time so you always know it's
working. Use --help on any subcommand for details.
๐ช Wiring into your client
Claude Code
supamem install --client claude-code
Adds an entry to ~/.claude.json under mcpServers and registers a session-start hook under
~/.claude/hooks/. Preview without applying with --dry-run.
Cursor
supamem install --client cursor
Patches .cursor/mcp.json and writes .cursor/rules/dual-memory.mdc.
OpenCode
supamem install --client opencode
Updates ~/.config/opencode/opencode.json and writes a session-start hook to
~/.config/opencode/hooks/.
๐ง How it works
โโโโโโโโโโโโโโโโโโโ MCP/stdio โโโโโโโโโโโโโโโโโโโ REST โโโโโโโโโโโโโโโ
โ Claude / Cursor โ โโโโโโโโโโโโโโโโบ โ supamem MCP โ โโโโโโโโโโบ โ Qdrant โ
โ / OpenCode โ โโโโโโโโโโโโโโโโ โ server โ โโโโโโโโโโ โ (vectors) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ โฒ
โ session-start hook โ tuned-hybrid retrieval
โผ โ (BM25 + MiniLM fusion)
โโโโโโโโโโโโโโโโโโโ โ
โ supamem hook โ โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (auto-recall) โ
โโโโโโโโโโโโโโโโโโโ
- Indexer chunks Markdown by header (T-1 chunker, 200-token target / 250 soft max)
- Embedders produce sparse (BM25) and dense (MiniLM-L6) vectors
- Retrieval runs both arms in parallel, fuses with reciprocal rank fusion, returns top-k
- MCP server exposes
qdrant-findandqdrant-storetools, plus context resources - Hooks call
supamem hook <client>at the right moment, so memory loads transparently
๐ค Contributing
We welcome PRs! Quick start:
git clone https://github.com/dzmitrys-dev/supamem.git
cd supamem
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest
ruff check .
Coming from an in-tree dev_memory setup? See MIGRATION.md.
๐ License
MIT โ see LICENSE.
๐ Delivered with care by
SoftChat ย ยทย SoftSkillz
Russian-language AI chat platform ย ยทย AI-first product engineering
supamem was extracted from SoftChat's production memory stack so every team can run on the same
battle-tested pipeline. If it makes your agents smarter, give us a โญ โ and check out what we
build with it.
Made with care in Belarus ย ๐ง๐พย ยท ย app.softchat.ru ย ยทย softskillz.ai
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file supamem-0.1.3.tar.gz.
File metadata
- Download URL: supamem-0.1.3.tar.gz
- Upload date:
- Size: 196.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eafe50f9b44be2485cd06bafbae9b8d55b7a7a60c616149079cf8d99d932e83f
|
|
| MD5 |
6ef24dd9548bfa19ae2cf7d66d3bff15
|
|
| BLAKE2b-256 |
56f777e1d05915604194bbfd23b200e5bc9bc9e58dbdc07719b1c5837385ddb9
|
Provenance
The following attestation bundles were made for supamem-0.1.3.tar.gz:
Publisher:
release.yml on dzmitrys-dev/supamem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
supamem-0.1.3.tar.gz -
Subject digest:
eafe50f9b44be2485cd06bafbae9b8d55b7a7a60c616149079cf8d99d932e83f - Sigstore transparency entry: 1401366370
- Sigstore integration time:
-
Permalink:
dzmitrys-dev/supamem@17db78c5d149619e6e22f535f197c880aff74524 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/dzmitrys-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@17db78c5d149619e6e22f535f197c880aff74524 -
Trigger Event:
push
-
Statement type:
File details
Details for the file supamem-0.1.3-py3-none-any.whl.
File metadata
- Download URL: supamem-0.1.3-py3-none-any.whl
- Upload date:
- Size: 80.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42453bca96a13b4fb8a7973b2c9a4881062e96c78050535da50271ba891a90f8
|
|
| MD5 |
c403d4fcdaa492e7180418c12a815c64
|
|
| BLAKE2b-256 |
73688c17990b1213b2286c35e94c01eb3a8ffc7f838a0211ff6b461e91e31abc
|
Provenance
The following attestation bundles were made for supamem-0.1.3-py3-none-any.whl:
Publisher:
release.yml on dzmitrys-dev/supamem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
supamem-0.1.3-py3-none-any.whl -
Subject digest:
42453bca96a13b4fb8a7973b2c9a4881062e96c78050535da50271ba891a90f8 - Sigstore transparency entry: 1401366421
- Sigstore integration time:
-
Permalink:
dzmitrys-dev/supamem@17db78c5d149619e6e22f535f197c880aff74524 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/dzmitrys-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@17db78c5d149619e6e22f535f197c880aff74524 -
Trigger Event:
push
-
Statement type: