Persistent memory for LLM coding agents — local-first, model-agnostic, SQLite-backed

These details have not been verified by PyPI

Project links

Project description

callmem

Renamed from llm-mem in v0.2.0. Existing installs keep working — the llm-mem console script, the python -m llm_mem.mcp.server entry point, the .llm-mem/ config dir, and the LLM_MEM_* env vars are all honored as aliases. Run callmem migrate in a project to update its names to the new canonical forms.

Persistent memory for LLM coding agents.

callmem gives coding agents a durable, searchable memory that survives across sessions. It captures what happened, compresses it in the background using a local LLM, and serves a compact briefing when the next session starts — so the agent picks up where you left off without manual context management.

Works with OpenCode and Claude Code side by side. Both tools write to the same memory database, so you can rate-limit-swap between them mid-project without losing context. The extraction LLM is also swappable at any time — switch models in config.toml and restart the daemon; prior memories stay intact and keep being usable.

Inspired by claude-mem, but built from the ground up with a different philosophy: model-agnostic, local-first, and designed around pluggable LLM backends rather than a single vendor.

Key Features

Automatic Context at Startup

When a new session begins, callmem generates a structured briefing with context economics, an emoji-coded observation timeline, and a session summary. This is written to SESSION_SUMMARY.md in your project root, where your coding agent picks it up automatically — no manual context management needed.

Real-Time Capture and Extraction

During the session, callmem ingests prompts, responses, tool calls, and file changes from your coding agent — either via OpenCode's SSE stream or by tailing Claude Code's JSONL transcripts at ~/.claude/projects/<slug>/*.jsonl. Both run concurrently, so a project using both agents sees unified history. A background worker runs entity extraction through your local LLM, pulling out decisions, facts, TODOs, bugs, features, and discoveries. New observations appear in the web UI within milliseconds via Server-Sent Events.

Layered Compression

Raw events are compressed through multiple layers to keep token usage in check:

Entity extraction — structured knowledge pulled from raw conversation
Chunk summaries — rolling mid-session compression every N events
Session summaries — structured wrap-up when a session ends (Investigated / Learned / Completed / Next Steps)
Cross-session summaries — periodic project-level rollups
Compaction — old events archived to keep the database lean

Dual Content Views

Each observation has two representations:

Key Points — bullet-point summary (~50-100 tokens), cheap for context injection
Synopsis — flowing prose paragraph (~200-400 tokens), loaded on demand

Pluggable LLM Backend

callmem doesn't care which model you use for coding. It uses a separate local model for memory maintenance:

Ollama (recommended) — fully local, zero API cost, works offline
OpenAI-compatible — any /v1/chat/completions API (LM Studio, vLLM, etc.)
None — pattern-only mode when no LLM is available

The extraction model can be swapped at any time. Change [ollama].model (or [openai_compat].model) in config.toml and restart the daemon — previously extracted entities remain valid and fully searchable, and new events get extracted with whichever model is configured now. Tested upgrades: qwen3:8b ↔ qwen3:30b ↔ gemma4:e4b. You can also flip [llm].backend between ollama, openai_compat, and none without touching the database.

Web UI

A local web interface for browsing and managing memories:

Card-based feed with colour-coded category badges
Expandable cards with Key Points / Synopsis toggle
Real-time updates via SSE
Full-text search across all entities
Session browser with event timeline
Briefing preview showing exactly what the agent sees
Accessible from your Tailscale network (default bind 0.0.0.0)

Sensitive Data Handling

Two-layer detection (pattern matching + LLM classification) catches secrets, credentials, and PII at ingest time. Detected items are redacted from memory and stored in an encrypted vault with configurable false-positive management.

MCP Integration

Exposes tools via the Model Context Protocol so compatible agents can query memory on demand:

search — full-text search across all observations
get_briefing — generate a startup briefing
search_by_file — find observations related to specific files
timeline — chronological context around an observation

Requirements

Python 3.10+
Ollama with a summarisation model (recommended: qwen3:8b or qwen3:30b)
Linux (tested on Ubuntu/Debian x86_64 and ARM64)
SQLite 3.35+ (FTS5 support required)

Tested Environments

Environment	Status
Ubuntu 24.04, x86_64	Tested
Ubuntu 24.04, ARM64	Tested
Ollama with qwen3:8b	Tested
Ollama with qwen3:30b	Tested
Ollama with gemma4:e4b	Tested
OpenCode as coding agent	Tested
Claude Code as coding agent	Tested
Running both OpenCode and Claude Code against one project	Tested

macOS and Windows are untested but should work anywhere Python and Ollama run. The systemd service integration is Linux-only.

Quick Start

1. Prerequisites

# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a summarisation model (any instruction-following model works)
ollama pull qwen3:8b

2. Install callmem

# Install with pip (recommended)
pip install callmem

# Or with uv
uv pip install callmem

# Or install from source for development
git clone https://github.com/DANgerous25/callmem.git
cd callmem
uv sync --extra dev

3. Run the Setup Wizard

uv run callmem setup

The wizard walks you through:

Choosing your LLM backend and model
Setting the web UI port (with multi-project conflict detection)
Configuring network bind address
Picking your coding tools (OpenCode / Claude Code / both) — writes opencode.json and/or .mcp.json as needed, and patches AGENTS.md / CLAUDE.md with MCP usage instructions
Importing existing OpenCode sessions from SQLite
Optionally installing a systemd user service for auto-start

The setup script is safe to re-run — it reconfigures without wiping data and backs up your config.toml before changes.

4. Start the Daemon

# All-in-one: web UI + background workers + OpenCode SSE adapter
# + Claude Code JSONL tailer. Each adapter is independently gated
# by [adapters].opencode / [adapters].claude_code in config.toml.
uv run callmem daemon

# Or via make
make daemon

# Or via systemd (if installed during setup)
make start

Restarting after an upgrade or config change. The systemd service holds the old code in memory, so git pull / pip install -e . / config.toml edits only take effect after a restart:

make restart          # this project (preferred)
make logs             # tail journalctl for the service

# Or raw systemctl — the unit is named callmem-<project-dir-name>:
systemctl --user restart callmem-<project>.service
systemctl --user status  callmem-<project>.service

Each project that ran callmem setup with systemd enabled has its own unit, so restart the one matching your project directory.

5. Configure Your Coding Agent

callmem setup and callmem init both write the right config file(s) automatically. If you prefer to wire it up by hand:

OpenCode — add to opencode.json in your project:

{
  "mcp": {
    "callmem": {
      "type": "local",
      "command": ["python3", "-m", "callmem.mcp.server", "--project", "."],
      "enabled": true
    }
  }
}

Claude Code — add to .mcp.json in your project (note the split command/args):

{
  "mcpServers": {
    "callmem": {
      "command": "python3",
      "args": ["-m", "callmem.mcp.server", "--project", "."]
    }
  }
}

Both can coexist; they share the same SQLite database.

Once configured, you can type /briefing in OpenCode to display the startup briefing on demand. In Claude Code, ask the agent to call the mem_get_briefing MCP tool, or read SESSION_SUMMARY.md directly.

6. Open the Web UI

Navigate to http://localhost:9090 (or your configured host:port).

How It Works

┌─────────────────┐   SSE        ┌─────────────────┐   Extract      ┌──────────────┐
│  OpenCode       │ ──────────▶  │   callmem       │ ─────────────▶ │  Local LLM   │
└─────────────────┘              │   Adapters      │                │  (Ollama or  │
┌─────────────────┐   JSONL      │  (opencode +    │                │  OpenAI-like)│
│  Claude Code    │ ──────────▶  │   claude_code)  │                └──────────────┘
└─────────────────┘              └────────┬────────┘
                                          │
                                          ▼
                                 ┌─────────────────┐
                                 │   SQLite DB     │
                                 │   + FTS5        │
                                 └────────┬────────┘
                                          │
                              ┌───────────┼───────────┐
                              ▼           ▼           ▼
                        ┌──────────┐ ┌──────────┐ ┌──────────┐
                        │ Web UI   │ │ MCP      │ │ Briefing │
                        │ Feed     │ │ Server   │ │ Writer   │
                        └──────────┘ └──────────┘ └──────────┘

Capture: The OpenCode adapter subscribes to its SSE stream; the Claude Code adapter tails each transcript file under ~/.claude/projects/<slug>/ using a persistent byte-offset, so a restart resumes mid-file instead of replaying or dropping records. Both run inside the same daemon process.
Extract: A background worker sends event batches to your local LLM for entity extraction — decisions, facts, TODOs, bugs, features, discoveries. Switching the extraction model later does not invalidate past entities.
Compress: Summaries are generated at chunk, session, and cross-session levels.
Serve: The briefing writer generates SESSION_SUMMARY.md in your project root; the MCP server responds to on-demand queries from whichever agent you're using; the web UI shows everything in real-time.

CLI Reference

callmem setup              # Interactive setup wizard
callmem daemon             # Start UI + workers + adapter in one process
callmem ui                 # Start web UI only
callmem serve              # Start MCP server only
callmem import --source opencode     --all  # Import OpenCode sessions from SQLite
callmem import --source claude-code  --all  # Import Claude Code transcripts (JSONL)
callmem import --status                     # Show current/last import progress
callmem status             # Show service status
callmem search <query>     # Search memories from the command line
callmem briefing           # Generate and print a briefing
callmem briefing --write   # Write briefing to SESSION_SUMMARY.md

make restart               # Restart the systemd unit for this project
make logs                  # Tail journalctl for the service

Entity Categories

callmem extracts these observation types, each with a colour-coded badge in the UI:

Category	Icon	Description
Feature	🟢	New functionality added
Bugfix	🔴	Bug identified and/or fixed
Discovery	🔵	Notable insight or finding
Decision	⚖️	Architectural or design choice
Todo	📋	Task to be done
Fact	📝	Durable project knowledge
Failure	❌	Error or failure encountered
Research	🔬	Investigation or analysis
Change	🔄	General code or file change

Configuration

All settings live in .callmem/config.toml in your project root. Key options:

[llm]
backend = "ollama"               # ollama | openai_compat | none
model = "qwen3:8b"
api_base = "http://localhost:11434"

[ui]
host = "0.0.0.0"                 # Bind address (0.0.0.0 for Tailscale access)
port = 9090

[adapters]
opencode = true                  # Listen on OpenCode's SSE stream
claude_code = true               # Tail ~/.claude/projects/<slug>/*.jsonl
claude_code_poll_interval = 2.0  # seconds between disk scans
claude_code_idle_timeout = 300   # seconds before an idle CC session is closed

[briefing]
max_tokens = 2000                # Token budget for startup briefing
auto_write_session_summary = true
session_summary_filename = "SESSION_SUMMARY.md"

[extraction]
batch_size = 10                  # Events per extraction batch

See docs/config.md for the full reference.

Why SQLite, Not a Vector DB?

Most coding memory retrieval is structured:

"What decisions did we make about auth?"
"What TODOs are still open?"
"What happened in the last 3 sessions?"

These are better served by structured tables + full-text search (FTS5) than embedding similarity. SQLite is zero-dependency, single-file, trivially backed up, and fast enough for tens of thousands of memories.

Vector embeddings are a planned enhancement for semantic retrieval when keyword search isn't enough. The schema is designed to accommodate them without migration pain.

Project Structure

callmem/
├── src/callmem/
│   ├── adapters/       # OpenCode SSE + import, Claude Code live tailer + import
│   ├── core/           # Engine, extraction, briefing, compression, event bus
│   ├── mcp/            # MCP server and tool definitions
│   ├── models/         # Data models, config, entity types
│   └── ui/             # Web UI (FastAPI + Jinja2 + htmx + SSE)
├── tests/              # 630+ tests (unit + integration)
├── docs/               # Architecture, schema, config, roadmap docs
└── pyproject.toml

Development

# Install with dev dependencies
uv sync --extra dev

# Run tests
make test

# Run linter
make lint

# Run both
make check

Roadmap

See docs/roadmap.md for the full plan. Highlights:

Progressive disclosure search — 3-layer MCP search pattern (index → timeline → full details)
File-level tracking — associate observations with specific files
Knowledge agents — build queryable corpora from observation history
Settings panel — web UI for config with live briefing preview
Vector search — optional semantic retrieval via sentence-transformers

Acknowledgements

callmem was inspired by claude-mem by Alex Newman. We share the same goal — giving coding agents persistent memory — but callmem is built from scratch with a focus on model-agnostic operation, local-first architecture, and pluggable LLM backends.

Known Issues

Auto-briefing plugin does not trigger on session start

An OpenCode plugin (.opencode/plugins/auto-briefing.js) is installed during setup that should auto-display the briefing when a new session starts. However, due to an upstream OpenCode bug where session.created events do not fire for plugins, this does not currently work. Use the /briefing command in OpenCode as a workaround. The plugin will activate automatically when the bug is fixed upstream — no changes needed.

Claude Code: tool results and thinking blocks are not ingested

The Claude Code adapter maps user prompts, assistant text, and tool_use blocks into the memory feed. tool_result blocks (system-side responses to tool calls) and thinking blocks are skipped in the current release to keep signal-to-noise high. A follow-up will revisit this — until then, a tool call appears in the feed but its outcome does not.

Python 3.10 compatibility shims

callmem supports Python 3.10+ but requires tomli (backport of tomllib) and typing_extensions on Python 3.10. These are installed automatically as conditional dependencies. On Python 3.11+, the stdlib equivalents are used.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.1

May 2, 2026

0.3.0

Apr 26, 2026

0.0.1

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

callmem-0.3.1.tar.gz (4.5 MB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

callmem-0.3.1-py3-none-any.whl (191.5 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file callmem-0.3.1.tar.gz.

File metadata

Download URL: callmem-0.3.1.tar.gz
Upload date: May 2, 2026
Size: 4.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for callmem-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`aa36ef3a87114d882529cad53393753eceb7510205d8fdf0c3d6802676deba7f`
MD5	`8b0e3c7d0be1448bb0590d79fe43bd88`
BLAKE2b-256	`ed3160f58e3d9a4b3cb259d1bb73657691b56fcc405171ca673155d9531de054`

See more details on using hashes here.

File details

Details for the file callmem-0.3.1-py3-none-any.whl.

File metadata

Download URL: callmem-0.3.1-py3-none-any.whl
Upload date: May 2, 2026
Size: 191.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.14

File hashes

Hashes for callmem-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d979a32ad3c53730b26418f1efbc424295e22320e4cb1f0b8ad6e4df96afa98d`
MD5	`dd3d39c663df7782e1aa65932588e225`
BLAKE2b-256	`4ff63d3549ad7e675d4f89ad6fedcd7745b10a59295223ec38e032a7cd6bba4f`

See more details on using hashes here.

callmem 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

callmem

Key Features

Automatic Context at Startup

Real-Time Capture and Extraction

Layered Compression

Dual Content Views

Pluggable LLM Backend

Web UI

Sensitive Data Handling

MCP Integration

Requirements

Tested Environments

Quick Start

1. Prerequisites

2. Install callmem

3. Run the Setup Wizard

4. Start the Daemon

5. Configure Your Coding Agent

6. Open the Web UI

How It Works

CLI Reference

Entity Categories

Configuration

Why SQLite, Not a Vector DB?

Project Structure

Development

Roadmap

Acknowledgements

Known Issues

Auto-briefing plugin does not trigger on session start

Claude Code: tool results and thinking blocks are not ingested

Python 3.10 compatibility shims

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes