Skip to main content

Local-first agent memory for Claude Code: episodic + semantic memory in one SQLite file.

Project description

snowpack

The snowpack is the season's memory — every storm recorded as a layer.

Local-first agent memory for Claude Code. Snowpack ingests Claude Code session transcripts into episodic memory (what happened across sessions) and semantic memory (durable facts, entities, relationships), all in a single SQLite file with vector + keyword search. The agent reaches it through an ordinary CLI — no MCP server, no daemon, no infrastructure.

The memory model

Agent memory is usefully split into four types. Claude Code ships with strong native support for two of them; snowpack exists for the other two.

Type What it holds Claude Code natively What snowpack does
Episodic — what happened Past sessions: decisions, errors, paths Nothing — sessions are invisible to each other Owns it. Hooks ingest every transcript into searchable episodes
Semantic — what's true Durable facts, entities, relationships Nothing Owns it. Facts extracted from episodes, with provenance, validity windows, and supersession
Working — what's happening now Current task state The context window — strong, until compaction silently drops detail Backstops it. stash checkpoints, PreCompact capture, and resume re-injection carry task state across compaction
Procedural — how to work Standing instructions, workflows CLAUDE.md and skills — strong Feeds it. sinter mines repeated corrections into reviewable CLAUDE.md candidates

The asymmetry is deliberate (ADR-001 D8): snowpack owns the two types Claude Code lacks entirely, and stays out of the way of the two it does well — backstopping working memory only at the one point it fails (compaction), and feeding procedural memory as human-reviewed candidates rather than competing with CLAUDE.md as a second source of truth.

Status

Core pipeline implemented (episodic + semantic memory, hybrid retrieval, telemetry, distillation). See docs/adr/ADR-001-memory-architecture.md for the architecture and decision record, docs/hooks.md for ingestion hook setup, and docs/claude-md-snippet.md for the agent-facing usage docs. Whether retrieval is worth its tokens is deliberately an open, measured question: docs/adr/ADR-006-memory-value-assessment.md records the honest assessment (what's stored, what's kept, when an agent will actually reach for it) and docs/gate-log.md tracks the usefulness gate that Phase 2 waits on.

Quick start

# 1. Install
pip install snowpack       # or: uv tool install snowpack

# 2. Wire everything up (idempotent, re-runnable, prompts before writing)
snowpack setup

snowpack setup checks Ollama (printing install/pull commands if it's down — it's a soft requirement, see "Embeddings" below), creates ~/.snowpack/snowpack.db, merges the ingestion, compaction-survival, and session-orientation hooks into ~/.claude/settings.json (timestamped backup first), installs the memory snippet into ~/.claude/CLAUDE.md between managed markers, and adds the snowpack permission allowlist. --dry-run shows the diffs first, --check is a doctor that audits every integration point, and --uninstall removes exactly what setup added.

# 3. Use it
snowpack probe "auth decisions"   # hybrid retrieval (vector + keyword + recency)

(Ingestion runs out-of-band via the installed hooks; snowpack obs ingest also works manually.)

Embeddings: Ollama setup and choosing a model

Vector search needs an embedding model — by default a local one served by Ollama. It is a soft requirement: without it snowpack still works in vectorless mode — ingest stores episodes un-embedded, probe degrades to keyword + graph + recency search, and the next ingest after the provider comes up backfills the missing vectors automatically.

Where local models aren't allowed (e.g. a workplace that can't sandbox them), point snowpack at a hosted or gateway /embeddings endpoint, or skip embeddings entirely and run vectorless — the full no-Ollama path (setup, reindexing an existing database, what it costs a work account) is docs/no-ollama.md.

Install and run Ollama

# macOS
brew install ollama        # or download the app from https://ollama.com

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# start the server (the desktop app does this automatically)
ollama serve

# fetch the default embedding model (~270 MB)
ollama pull nomic-embed-text

Prefer it sandboxed? A hardened Docker setup (localhost-only API, dropped capabilities, isolated model storage) ships in docker/docker-compose.yml:

docker compose -f docker/docker-compose.yml up -d
docker compose -f docker/docker-compose.yml exec ollama ollama pull nomic-embed-text

See docs/ollama-docker.md for GPU setup and the macOS caveat (containers can't use Apple Silicon's GPU — native Ollama is faster there).

Verify it's answering:

curl -s http://localhost:11434/api/embed \
  -d '{"model": "nomic-embed-text", "input": ["hello"]}' | head -c 120

If Ollama runs somewhere other than localhost:11434 (a container, another machine), point snowpack at it with SNOWPACK_OLLAMA_URL:

export SNOWPACK_OLLAMA_URL=http://gpu-box:11434

Choosing the embedding model

The model is fixed per database at snowpack init, because the vector tables are created with that model's output dimension (vec0 columns are fixed-width):

snowpack init                                  # nomic-embed-text (768-d)
snowpack init --model mxbai-embed-large       # higher quality, 1024-d
snowpack init --model all-minilm              # smaller/faster, 384-d

You normally don't pass --dim: init asks the running Ollama what dimension the model actually produces (and refuses a --dim that contradicts it). If Ollama isn't running, init falls back to a built-in table for common models (nomic-embed-text, mxbai-embed-large, all-minilm, snowflake-arctic-embed, bge-m3) — for anything else, either start Ollama first or pass --dim explicitly.

The configured model, dimension, and task prefixes are recorded in the database (meta table) and used for every subsequent embed, so you never specify the model again after init — obs ingest and probe read it from the database. To see what a database was initialized with:

sqlite3 ~/.snowpack/snowpack.db "SELECT * FROM meta"

Changing models later is one command — it re-embeds everything in place with zero loss of episodes, facts, or telemetry:

snowpack reindex --model all-minilm

The new model must be live (reindex re-embeds with it, so there's no offline fallback). The database file is backed up first and probe keeps working throughout; if the run is interrupted, rerun with --resume to continue from where it stopped.

Upgrading snowpack across a schema change is also one command: when a new version needs a newer database schema, every command refuses with a pointer to snowpack migrate, which backs up the file and applies the pending migrations.

Fact extraction (semantic memory)

Episodes become durable facts via snowpack obs extract, which calls a model API. It runs out-of-band — manually or from a cron/timer, not a hook (see docs/hooks.md).

Fact extraction defaults to Anthropic's OpenAI-compatible endpoint and needs an API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or SNOWPACK_EXTRACTION_API_KEY); override with SNOWPACK_EXTRACTION_BASE_URL / _MODEL (localhost endpoints like Ollama's /v1 need no key). Keys are read from the environment only and never stored.

No API key at all? Extraction can fall back to Claude Code itself (claude -p, headless) under your existing subscription login — no key, no new model runtime. Because that bills your subscription, the fallback is consent-gated: approve it per run with --claude-fallback, grant standing approval with SNOWPACK_CLAUDE_FALLBACK=1, or answer the interactive confirm (defaults to no); non-interactive runs without approval refuse with exit 3. (SNOWPACK_EXTRACTION_PROVIDER=claude-code forces the provider outright; =openai-compatible forbids the fallback.) As a paid transport it also needs an explicit bound to start — --limit, --token-budget, or --cost-budget — and runs report tokens/cost; snowpack stats shows lifetime extraction spend.

CLI surface

Command Purpose
snowpack setup One-command onboarding: hooks, CLAUDE.md, permissions, db (--check doctor, --dry-run, --uninstall)
snowpack init Create and configure the database
snowpack config Persist provider defaults (extraction endpoint/model, Ollama URL) in the db — no env vars needed (list, set, unset)
snowpack obs ingest Ingest new transcript exchanges (incremental, idempotent)
snowpack obs extract Extract durable facts from episodes (API-assisted)
snowpack obs list List recent episodes
snowpack probe "query" Hybrid retrieval (vector + keyword + graph + recency) with telemetry
snowpack feedback Mark retrieved memories as used — trains ranking
snowpack stash Working-memory checkpoint per project (the compaction backstop)
snowpack resume Re-injection payload for SessionStart hooks (compaction survival)
snowpack thaw Replay or search this session's pre-compaction exchanges, in order (post-compaction recovery)
snowpack redact Retroactive secret scan/rewrite over stored memory (--scan, --apply)
snowpack stats Telemetry overview; --refresh recomputes usefulness
snowpack sinter Mine repeated corrections into CLAUDE.md candidates (the procedural feed)
snowpack prune Telemetry-nominated pruning: candidates, then audited soft archive/keep/restore, log
snowpack entity merge Point a duplicate entity at its canonical form
snowpack reindex Switch embedding models: re-embed and swap in place (--resume)
snowpack migrate Upgrade the database schema after a snowpack upgrade (backup first)
snowpack pit Local web UI: entity graph + telemetry dashboard

Privacy: secret redaction

Transcripts carry whatever your tools printed — env dumps, tokens, connection strings. Snowpack redacts known secret shapes (AWS keys, GitHub tokens, JWTs, PEM blocks, URL credentials, password = … assignments, and more) at ingest, before content is hashed, embedded, or indexed; stash writes get the same pass. Hits become [redacted:<type>] markers, the ingest report counts them, and snowpack stats shows the lifetime total. For data stored before redaction existed, snowpack redact --scan reports hits and snowpack redact --apply rewrites them in place (database backup first).

This is best-effort, known-shape detection — defense in depth, not a guarantee. Custom patterns and an allowlist for documented example keys live in ~/.snowpack/redaction.toml; see docs/redaction.md.

Pruning: telemetry nominates, the agent decides

Memory accumulates; not all of it stays worth retrieving. No decay formula can safely tell a dead memory from a rarely-needed-but-load-bearing one, so snowpack splits the job (ADR-003 D7): snowpack prune candidates --json nominates from telemetry — dead facts (never retrieved, >30 days), weak layers (retrieved ≥5×, never used), closed supersession chains, stale episodes (>90 days, never retrieved, provenance-guarded) — each with its evidence, and the consuming agent (or you) reads and judges each one. Decisions are explicit and audited: prune archive <ids> --reason "…" is a soft delete that hides the memory from every retrieval channel, prune keep <ids> --reason "…" records a survivor and suppresses re-nomination for 90 days, prune restore reverses any archive intact, and prune log shows the full trail. Nothing here hard-deletes — that's snowpack gc, the mechanical sweep layer underneath: it hard-deletes only memories archived more than 90 days ago, rolls raw telemetry older than 180 days into daily aggregates (channel win-rates are preserved, and feedback-bearing rows stay raw for the future eval corpus), drops entities left with no facts and stale ingest watermarks, and runs VACUUM+ANALYZE when enough space is reclaimable. It never touches a current or unarchived memory; snowpack gc --dry-run prints the exact work order without changing anything.

The pit (web UI)

snowpack pit            # serves http://127.0.0.1:8617 and opens the browser

A read-only, single-page UI over the same SQLite file (no extra dependencies, no build step; the graph library is vendored so it works offline):

  • Graph tab — entities as nodes, facts as edges. Visual weights are real telemetry, not decoration: node size = usage, edge width = retrieval frequency, color = staleness, and dead gray = never retrieved — your pruning candidates at a glance. Click through node → fact → provenance episode; toggle superseded facts; search to highlight.
  • Stats tab — totals, retrieval latency, channel win-rate (vector vs keyword vs graph — how to rebalance fusion weights), zero-result queries (gap detection), most/least-used facts, persistent weak layers, and recent retrievals expandable to per-result channels/scores/used flags.

The server binds 127.0.0.1 only and never mutates user data (the one write is recomputing derived usefulness scores on demand). Full guide — including how to read the visual encoding and troubleshooting — in docs/pit.md; stack decisions in docs/adr/ADR-002-pit-ui.md.

Documentation map

  • docs/adr/ — architecture decision records (ADR-001 core, ADR-002 pit UI, ADR-003 pre-Phase-2 hardening program, ADR-004 spend visibility and cost controls, ADR-005 post-compaction recovery, ADR-006 memory-value assessment: what's stored, what's kept, what actually gets referenced)
  • docs/plans/ — point-in-time implementation plans approved before each build round, with outcomes
  • docs/pit.md — running and reading the pit UI
  • docs/redaction.md — secret redaction: built-in patterns, redaction.toml, retroactive cleanup
  • docs/hooks.md — out-of-band ingestion hooks
  • docs/ollama-docker.md — sandboxed Ollama
  • docs/no-ollama.md — running without Ollama entirely (work machines)
  • docs/claude-md-snippet.md — agent-facing usage docs for CLAUDE.md
  • docs/skill-memory-maintenance.md — the pruning loop as candidate skill text for an agent
  • docs/releasing.md — publishing wheels to PyPI (trusted publishing)

Roadmap

The agent-memory market is crowded with cloud-first offerings (Mem0, Zep, Letta). Snowpack takes the opposite entry: a local-first core that syncs up when you want it to — local-first is the foundation later phases build on, not a stage to discard.

  1. Phase 1 — local dev tool (now). Everything in this repo: single SQLite file, CLI + hooks integration, telemetry from day one. Goal: prove retrieval quality and accumulate the usage data later tuning depends on.
  2. Phase 2 — local-first + sync. The SQLite file stays the on-device source of truth; optional sync to a hosted backend adds multi-device use, backup, and selective team sharing. The integration surface broadens beyond Claude Code: MCP server plus a language-agnostic SDK/HTTP API.
  3. Phase 3 — hosted platform. A managed, multi-tenant memory service covering all four memory types (episodic, semantic, working, procedural). Self-hosting stays a first-class path.

Full reasoning, the fixed-vs-provisional decision table, and migration risks live in docs/adr/ADR-001-memory-architecture.md ("Phasing & evolution").

Development

uv sync
uv run pytest
uv run ruff check

Demo data

To try the full surface without real transcripts (and without touching ~/.snowpack), seed a sandboxed demo — synthetic transcripts for two fake projects, pre-extracted facts (including a superseded pair), and probe telemetry:

uv run python scripts/seed_demo.py        # creates ~/.snowpack-demo
export SNOWPACK_DB=~/.snowpack-demo/snowpack.db
export SNOWPACK_CLAUDE_PROJECTS=~/.snowpack-demo/projects
snowpack probe "what did we decide about auth" --all-projects
snowpack stats
snowpack pit

It works without Ollama (probe degrades to keyword+recency, exactly as in real use); with Ollama running the same script embeds everything.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowpack-0.1.15.tar.gz (333.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snowpack-0.1.15-py3-none-any.whl (185.0 kB view details)

Uploaded Python 3

File details

Details for the file snowpack-0.1.15.tar.gz.

File metadata

  • Download URL: snowpack-0.1.15.tar.gz
  • Upload date:
  • Size: 333.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for snowpack-0.1.15.tar.gz
Algorithm Hash digest
SHA256 e16da137afc601f51a22c96c56910b284ec00fea3b1b09b203ebe8e7eeff1768
MD5 a535add7c950256553497d79655bab14
BLAKE2b-256 abae7b34e0ea82a9fcbff6a219dc655f5e3da2b6097f25734cd544e3fd8b2030

See more details on using hashes here.

Provenance

The following attestation bundles were made for snowpack-0.1.15.tar.gz:

Publisher: publish.yml on davidkelly-snoday/snowpack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file snowpack-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: snowpack-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 185.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for snowpack-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 20fc854006e62a584212110c7d6d983c314011f0c428c17196613c4325e40852
MD5 f268c7c7b82f1febfe452b29e8b59e56
BLAKE2b-256 0a3d73aba4ef06d6207013e39103cebc099ac534ce51cffb77602025e316885a

See more details on using hashes here.

Provenance

The following attestation bundles were made for snowpack-0.1.15-py3-none-any.whl:

Publisher: publish.yml on davidkelly-snoday/snowpack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page