Skip to main content

Project-agnostic dual-memory tooling for Claude Code, Cursor, and opencode

Project description

๐Ÿง  supamem

Qdrant-backed dual-memory for AI coding agents

Give Claude Code, Cursor, and OpenCode persistent semantic + structural memory across every project.

PyPI Python License Qdrant MCP Powered by SoftChat


๐Ÿ‘‹ Built by Dzmitry Sukhau โ€” AI-native Solution / Software Architect / CTO

Available for consulting on AI products, integrating AI into existing products, and business-process automation.

If you're shipping LLM features, evaluating retrieval pipelines, hardening agentic systems, or building an AI-first product from scratch โ€” let's talk.

LinkedIn โ€” Dzmitry Sukhau ย ย  Open to Consulting


โœจ What is supamem?

supamem is a single-binary CLI that wires up a production-grade memory layer for any AI coding assistant. Drop it into a fresh repo, run supamem init, and your agents instantly gain:

  • ๐Ÿ” Semantic search over project notes, ADRs, decisions, and past conversations (hybrid sparse+dense retrieval)
  • ๐Ÿค– MCP server that any compatible client (Claude Code, Cursor, OpenCode) can talk to
  • ๐Ÿช Per-client hooks that auto-load relevant memory at session start and on file edits
  • ๐Ÿ“Š Welford usage stats so you can see what memory is actually being recalled
  • ๐Ÿงช Eval harness with a 33-query golden corpus to detect retrieval regressions

Battle-tested inside SoftChat (Phases 80.1โ€“80.5) before being extracted into a standalone package every team can adopt.


๐ŸŽฏ Why supamem exists

The problem: Coding agents have no memory between sessions. Every time you open a new conversation in Claude Code / Cursor / OpenCode, the model has zero context about your codebase, past decisions, ADRs, known issues, or conventions. So either:

  1. You re-paste 5โ€“15 KB of context at the start of every session (slow, error-prone, costly), or
  2. You let the agent flounder โ€” it grep-walks the repo, asks redundant questions, forgets last week's decisions, and rediscovers the same gotchas you already documented six months ago.

The fix: A persistent semantic + structural memory layer that automatically retrieves the right 1โ€“2 KB of context for the current prompt โ€” no manual pasting, no re-explaining, no context blow-out.

Phase 80.1 bench (33 labeled goldens, real Claude Code sessions): โˆ’78.5% tokens vs naive whole-doc retrieval at the same recall, p95 73 ms end-to-end.

The full evaluation is the same one we ran inside SoftChat to lock the production pipeline. Methodology: 33 representative dev queries โ†’ 4 retrieval arms compared (baseline_union, tuned_current, tuned_hybrid, mem0_vector) โ†’ token count + recall CI + latency measured per arm.

๐Ÿ“Š Token consumption: agent with memory vs without

Numbers below are per typical 30-turn Claude Code session assuming a real codebase with ~50 ADRs / insights / rules (โ‰ˆ what SoftChat ships). YMMV โ€” but the ratio between arms holds.

Approach Tokens/turn Tokens/30-turn session Notes
โŒ No memory layer โ‰ˆ 0 auto-injected, but you paste context manually 30,000โ€“80,000 (manual paste, repeated) You spend cognitive load on copying instead of building
โš ๏ธ Naive RAG (whole-doc embed) ~5,800 / turn ~174,000 Bloated, recalls big files when you only needed a paragraph
โœ… supamem tuned_hybrid ~1,250 / turn ~37,500 Same recall, โˆ’78.5% tokens vs naive RAG

๐Ÿ’ฐ Approximate inference cost savings

Anthropic API list pricing (Mar 2026): Sonnet 4.6 = $3 / Mtok input ยท Opus 4.7 = $15 / Mtok input.

Model Tokens saved/session vs naive RAG Cost saved/session Monthly (110 sessions)
Sonnet 4.6 136,500 $0.41 ~$45/dev
Opus 4.7 136,500 $2.05 ~$225/dev

A 10-engineer team running Opus saves ~$2,250/month on input tokens alone โ€” without counting the cost of slower iteration, lost decisions, and time spent re-pasting context. Output token savings (less hallucination, fewer back-and-forth turns) compound on top.

๐ŸฅŠ vs the alternatives

No memory Naive RAG mem0 / atomic facts supamem (tuned_hybrid)
Auto-inject on session start โŒ โš ๏ธ โœ… โœ…
Hybrid sparse+dense retrieval โŒ โŒ โŒ โœ…
Code-identifier preservation โŒ โœ… โŒ (drops names) โœ…
Locked schema + golden eval โŒ โŒ โŒ โœ…
Multi-client (Claude/Cursor/OpenCode) โŒ โŒ โš ๏ธ โœ…
p95 latency n/a ~120 ms ~80 ms 73 ms
Token bloat High (manual) Highest Low but lossy Lowest with full recall

Why hybrid? BM25 catches exact identifiers (ChatService.generate, env-var names, file paths) that dense embeddings smear. Dense catches semantic intent ("how do we handle billing webhooks?") that BM25 misses. RRF fusion combines both rankings so you get the best of each.

Why not mem0? mem0's atomic-fact extraction loses code identifiers โ€” recall on the 33-query bench was 0.015 (effectively zero). Great for personal CRM-style memory, not for code-aware retrieval.


โšก๏ธ 60-second quickstart

# 1. Install (uv is the fastest path)
uv tool install supamem

# 2. Start Qdrant (one-time, ~30s)
docker run -d -p 6333:6333 -p 6334:6334 -v $HOME/.qdrant:/qdrant/storage qdrant/qdrant:latest

# 3. Bootstrap your project
cd your-project
supamem init

# 4. Wire it into your AI client
supamem install --client claude-code   # or cursor, opencode

# 5. Confirm everything is healthy
supamem doctor

That's it. Open Claude Code (or your preferred client) inside the project โ€” the memory tool is already on the menu. โœจ


๐Ÿš€ Features

Feature Description
๐Ÿ” Hybrid retrieval Tuned sparse (BM25) + dense (MiniLM) fusion, locked schema D-25
๐Ÿ“š Markdown chunker Header-aware, 200-token chunks with 250-token soft max (T-1)
๐Ÿค– MCP server stdio (default) and http transports, official mcp SDK
๐Ÿช Multi-client hooks Claude Code session-start, OpenCode session-start, Cursor MDC
๐Ÿงฐ One-command install Atomic config patching with auto-backup and rollback
๐Ÿฉบ supamem doctor Probe Qdrant, resolve config chain, surface version drift
๐Ÿ“Š Welford counters Track recall rate, latency, query volume per project
๐Ÿงช Eval harness 33-query golden corpus + regression detection
๐Ÿ” Brownfield migration Detect existing dev_memory and migrate non-destructively
๐ŸŽจ Stylish CLI Rich-powered spinners, panels, and color so you always see progress

๐Ÿ“‹ Prerequisites

You only really need two things: Python 3.12+ and Qdrant. Everything else is optional.

๐Ÿ Python 3.12+ ย ยทย  click to expand install commands
# macOS (Homebrew)
brew install python@3.12

# Linux (Ubuntu/Debian)
sudo apt install python3.12 python3.12-venv

# Windows (PowerShell)
winget install Python.Python.3.12

We strongly recommend installing uv โ€” the fastest Python package manager:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
๐Ÿ—„๏ธ Qdrant 1.10+ ย ยทย  vector database (required)

The simplest path is Docker:

docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v $HOME/.qdrant:/qdrant/storage \
  qdrant/qdrant:latest

Or with docker compose:

services:
  qdrant:
    image: qdrant/qdrant:latest
    ports: ["6333:6333", "6334:6334"]
    volumes: ["./qdrant_data:/qdrant/storage"]
    restart: unless-stopped

Don't have Docker? Run a managed cluster on Qdrant Cloud (free tier available) and point supamem at the URL via supamem init.

๐Ÿค– An MCP-compatible client ย ยทย  pick at least one
Client Install Notes
Claude Code npm install -g @anthropic-ai/claude-code First-class MCP support
Cursor Download from cursor.com Uses MDC rules + MCP
OpenCode curl -fsSL https://opencode.ai/install | bash Open-source TUI, MCP native

๐Ÿ“ฆ Install

# Recommended: uv (fastest, isolated)
uv tool install supamem

# Alternative: pipx (also isolated)
pipx install supamem

# Plain pip (in a venv)
pip install supamem

Verify:

supamem --version

You should see a colorful banner and the credit line. ๐ŸŽจ

Note: v0.1.0 ships as a git-tag release per D-44. PyPI publish lands in v0.2. To install the pre-release directly from git:

uv tool install git+https://github.com/dzmitrys-dev/supamem@v0.1.0

๐ŸŽฏ CLI surface

Command Purpose
supamem init Greenfield bootstrap โ€” probes Qdrant, creates collection, writes .supamem/config.toml
supamem install --client <name> Patch a client config (claude-code, cursor, opencode) โ€” atomic with backup
supamem index Embed dev memories into Qdrant using the locked tuned-hybrid pipeline (D-25)
supamem mcp-server Run the MCP server (--transport stdio default; --transport http for HTTP)
supamem hook <client> Per-client session/edit hooks (called by the client itself)
supamem doctor ๐Ÿฉบ Probe Qdrant, print resolved config chain, report version drift
supamem stats Welford schema-v2 usage counters from .supamem/state/
supamem migrate Brownfield migration from a pre-existing dev_memory collection
supamem eval Run the regression harness against the bundled 33-query golden corpus
supamem uninstall --client <name> Reverse supamem install cleanly

Every long-running command shows a live spinner with elapsed time so you always know it's working. Use --help on any subcommand for details.


๐Ÿช› Wiring into your client

Claude Code
supamem install --client claude-code

Adds an entry to ~/.claude.json under mcpServers and registers a session-start hook under ~/.claude/hooks/. Preview without applying with --dry-run.

Cursor
supamem install --client cursor

Patches .cursor/mcp.json and writes .cursor/rules/dual-memory.mdc.

OpenCode
supamem install --client opencode

Updates ~/.config/opencode/opencode.json and writes a session-start hook to ~/.config/opencode/hooks/.


๐Ÿง  How it works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    MCP/stdio     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    REST    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Claude / Cursor โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚  supamem MCP    โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚   Qdrant    โ”‚
โ”‚   / OpenCode    โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚     server      โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚  (vectors)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚                                    โ–ฒ
        โ”‚ session-start hook                 โ”‚ tuned-hybrid retrieval
        โ–ผ                                    โ”‚ (BM25 + MiniLM fusion)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                          โ”‚
โ”‚ supamem hook    โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚  (auto-recall)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  • Indexer chunks Markdown by header (T-1 chunker, 200-token target / 250 soft max)
  • Embedders produce sparse (BM25) and dense (MiniLM-L6) vectors
  • Retrieval runs both arms in parallel, fuses with reciprocal rank fusion, returns top-k
  • MCP server exposes qdrant-find and qdrant-store tools, plus context resources
  • Hooks call supamem hook <client> at the right moment, so memory loads transparently

๐Ÿค Contributing

We welcome PRs! Quick start:

git clone https://github.com/dzmitrys-dev/supamem.git
cd supamem
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest
ruff check .

Coming from an in-tree dev_memory setup? See MIGRATION.md.


๐Ÿ“œ License

MIT โ€” see LICENSE.


๐Ÿ’œ Delivered with care by

SoftChat ย ยทย  SoftSkillz

Russian-language AI chat platform ย ยทย  AI-first product engineering

supamem was extracted from SoftChat's production memory stack so every team can run on the same battle-tested pipeline. If it makes your agents smarter, give us a โญ โ€” and check out what we build with it.

Made with care in Belarus ย ๐Ÿ‡ง๐Ÿ‡พย  ยท ย app.softchat.ru ย ยทย  softskillz.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supamem-0.1.1.tar.gz (81.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

supamem-0.1.1-py3-none-any.whl (75.3 kB view details)

Uploaded Python 3

File details

Details for the file supamem-0.1.1.tar.gz.

File metadata

  • Download URL: supamem-0.1.1.tar.gz
  • Upload date:
  • Size: 81.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for supamem-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ff1b3c7a8a33fe5a277eb7f8202768e2435f0fbfd530cca1beff74d2453d242f
MD5 b64c8bdbe833005c0e6e8a32ea5b8303
BLAKE2b-256 2af014dd0294ebad3bfc054f5a3ddcb85cd0de3504c0295f872270d7974cd8c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for supamem-0.1.1.tar.gz:

Publisher: release.yml on dzmitrys-dev/supamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file supamem-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: supamem-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 75.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for supamem-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a2bf7f3f3fa25cf53b8aec6a35c792e0131db81367998d7d00b9f2818abba941
MD5 f24e1794dd436fe29812175e28b97396
BLAKE2b-256 a7a77b3b2fd167b05019f466301bdefa6cc18fb8f8f5f8bfeb0a35f67d27a537

See more details on using hashes here.

Provenance

The following attestation bundles were made for supamem-0.1.1-py3-none-any.whl:

Publisher: release.yml on dzmitrys-dev/supamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page