Local-first agent memory for Claude Code: episodic + semantic memory in one SQLite file.
Project description
snowpack
The snowpack is the season's memory — every storm recorded as a layer.
Local-first agent memory for Claude Code. Snowpack ingests Claude Code session transcripts into episodic memory (what happened across sessions) and semantic memory (durable facts, entities, relationships), all in a single SQLite file with vector + keyword search. The agent reaches it through an ordinary CLI — no MCP server, no daemon, no infrastructure.
Status
Core pipeline implemented (episodic + semantic memory, hybrid retrieval,
telemetry, distillation). See docs/adr/ADR-001-memory-architecture.md for
the architecture and decision record, docs/hooks.md for ingestion hook
setup, and docs/claude-md-snippet.md for the agent-facing usage docs.
Quick start
# 1. Install
pip install snowpack # or: uv tool install snowpack
# 2. Wire everything up (idempotent, re-runnable, prompts before writing)
snowpack setup
snowpack setup checks Ollama (printing install/pull commands if it's down —
it's a soft requirement, see "Embeddings" below), creates
~/.snowpack/snowpack.db, merges the ingestion, compaction-survival, and session-orientation hooks
into ~/.claude/settings.json (timestamped backup first), installs the memory
snippet into ~/.claude/CLAUDE.md between managed markers, and adds the
snowpack permission allowlist. --dry-run shows the diffs first,
--check is a doctor that audits every integration point, and --uninstall
removes exactly what setup added.
# 3. Use it
snowpack probe "auth decisions" # hybrid retrieval (vector + keyword + recency)
(Ingestion runs out-of-band via the installed hooks; snowpack obs ingest
also works manually.)
Embeddings: Ollama setup and choosing a model
Vector search needs an embedding model — by default a local one served by Ollama. It is a soft requirement: without it snowpack still works in vectorless mode — ingest stores episodes un-embedded, probe degrades to keyword + graph + recency search, and the next ingest after the provider comes up backfills the missing vectors automatically.
Where local models aren't allowed (e.g. a workplace that can't sandbox
them), snowpack init --provider openai-compatible targets any hosted or
gateway /embeddings endpoint instead: set SNOWPACK_EMBEDDING_BASE_URL
and, for non-localhost endpoints, an API key
(SNOWPACK_EMBEDDING_API_KEY or OPENAI_API_KEY). Or skip embeddings
entirely and run vectorless — snowpack setup --check reports which mode
you're in.
Install and run Ollama
# macOS
brew install ollama # or download the app from https://ollama.com
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# start the server (the desktop app does this automatically)
ollama serve
# fetch the default embedding model (~270 MB)
ollama pull nomic-embed-text
Prefer it sandboxed? A hardened Docker setup (localhost-only API,
dropped capabilities, isolated model storage) ships in
docker/docker-compose.yml:
docker compose -f docker/docker-compose.yml up -d
docker compose -f docker/docker-compose.yml exec ollama ollama pull nomic-embed-text
See docs/ollama-docker.md for GPU setup and the macOS caveat (containers
can't use Apple Silicon's GPU — native Ollama is faster there).
Verify it's answering:
curl -s http://localhost:11434/api/embed \
-d '{"model": "nomic-embed-text", "input": ["hello"]}' | head -c 120
If Ollama runs somewhere other than localhost:11434 (a container, another
machine), point snowpack at it with SNOWPACK_OLLAMA_URL:
export SNOWPACK_OLLAMA_URL=http://gpu-box:11434
Choosing the embedding model
The model is fixed per database at snowpack init, because the vector
tables are created with that model's output dimension (vec0 columns are
fixed-width):
snowpack init # nomic-embed-text (768-d)
snowpack init --model mxbai-embed-large # higher quality, 1024-d
snowpack init --model all-minilm # smaller/faster, 384-d
You normally don't pass --dim: init asks the running Ollama what dimension
the model actually produces (and refuses a --dim that contradicts it). If
Ollama isn't running, init falls back to a built-in table for common models
(nomic-embed-text, mxbai-embed-large, all-minilm,
snowflake-arctic-embed, bge-m3) — for anything else, either start Ollama
first or pass --dim explicitly.
The configured model, dimension, and task prefixes are recorded in the
database (meta table) and used for every subsequent embed, so you never
specify the model again after init — obs ingest and probe read it from
the database. To see what a database was initialized with:
sqlite3 ~/.snowpack/snowpack.db "SELECT * FROM meta"
Changing models later is one command — it re-embeds everything in place with zero loss of episodes, facts, or telemetry:
snowpack reindex --model all-minilm
The new model must be live (reindex re-embeds with it, so there's no
offline fallback). The database file is backed up first and probe keeps
working throughout; if the run is interrupted, rerun with --resume to
continue from where it stopped.
Upgrading snowpack across a schema change is also one command: when a
new version needs a newer database schema, every command refuses with a
pointer to snowpack migrate, which backs up the file and applies the
pending migrations.
CLI surface
| Command | Purpose |
|---|---|
snowpack setup |
One-command onboarding: hooks, CLAUDE.md, permissions, db (--check doctor, --dry-run, --uninstall) |
snowpack init |
Create and configure the database |
snowpack config |
Persist provider defaults (extraction endpoint/model, Ollama URL) in the db — no env vars needed (list, set, unset) |
snowpack obs ingest |
Ingest new transcript exchanges (incremental, idempotent) |
snowpack obs extract |
Extract durable facts from episodes (API-assisted) |
snowpack obs list |
List recent episodes |
snowpack probe "query" |
Hybrid retrieval (vector + keyword + graph + recency) with telemetry |
snowpack feedback |
Mark retrieved memories as used — trains ranking |
snowpack stash |
Working-memory checkpoint per project |
snowpack resume |
Re-injection payload for SessionStart hooks (compaction survival) |
snowpack redact |
Retroactive secret scan/rewrite over stored memory (--scan, --apply) |
snowpack stats |
Telemetry overview; --refresh recomputes usefulness |
snowpack sinter |
Mine repeated corrections into CLAUDE.md candidates |
snowpack prune |
Telemetry-nominated pruning: candidates, then audited soft archive/keep/restore, log |
snowpack entity merge |
Point a duplicate entity at its canonical form |
snowpack reindex |
Switch embedding models: re-embed and swap in place (--resume) |
snowpack migrate |
Upgrade the database schema after a snowpack upgrade (backup first) |
snowpack pit |
Local web UI: entity graph + telemetry dashboard |
Privacy: secret redaction
Transcripts carry whatever your tools printed — env dumps, tokens, connection
strings. Snowpack redacts known secret shapes (AWS keys, GitHub tokens, JWTs,
PEM blocks, URL credentials, password = … assignments, and more) at
ingest, before content is hashed, embedded, or indexed; stash writes get the
same pass. Hits become [redacted:<type>] markers, the ingest report counts
them, and snowpack stats shows the lifetime total. For data stored before
redaction existed, snowpack redact --scan reports hits and
snowpack redact --apply rewrites them in place (database backup first).
This is best-effort, known-shape detection — defense in depth, not a
guarantee. Custom patterns and an allowlist for documented example keys live
in ~/.snowpack/redaction.toml; see docs/redaction.md.
Pruning: telemetry nominates, the agent decides
Memory accumulates; not all of it stays worth retrieving. No decay formula
can safely tell a dead memory from a rarely-needed-but-load-bearing one, so
snowpack splits the job (ADR-003 D7): snowpack prune candidates --json
nominates from telemetry — dead facts (never retrieved, >30 days), weak
layers (retrieved ≥5×, never used), closed supersession chains, stale
episodes (>90 days, never retrieved, provenance-guarded) — each with its
evidence, and the consuming agent (or you) reads and judges each one.
Decisions are explicit and audited: prune archive <ids> --reason "…" is a
soft delete that hides the memory from every retrieval channel,
prune keep <ids> --reason "…" records a survivor and suppresses
re-nomination for 90 days, prune restore reverses any archive intact, and
prune log shows the full trail. Nothing here hard-deletes — that's a
later, mechanical GC pass over already-archived rows only.
The pit (web UI)
snowpack pit # serves http://127.0.0.1:8617 and opens the browser
A read-only, single-page UI over the same SQLite file (no extra dependencies, no build step; the graph library is vendored so it works offline):
- Graph tab — entities as nodes, facts as edges. Visual weights are real telemetry, not decoration: node size = usage, edge width = retrieval frequency, color = staleness, and dead gray = never retrieved — your pruning candidates at a glance. Click through node → fact → provenance episode; toggle superseded facts; search to highlight.
- Stats tab — totals, retrieval latency, channel win-rate (vector vs keyword vs graph — how to rebalance fusion weights), zero-result queries (gap detection), most/least-used facts, persistent weak layers, and recent retrievals expandable to per-result channels/scores/used flags.
The server binds 127.0.0.1 only and never mutates user data (the one write is
recomputing derived usefulness scores on demand). Full guide — including how
to read the visual encoding and troubleshooting — in docs/pit.md; stack
decisions in docs/adr/ADR-002-pit-ui.md.
Documentation map
docs/adr/— architecture decision records (ADR-001 core, ADR-002 pit UI, ADR-003 pre-Phase-2 hardening program, ADR-004 spend visibility and cost controls)docs/plans/— point-in-time implementation plans approved before each build round, with outcomesdocs/pit.md— running and reading the pit UIdocs/redaction.md— secret redaction: built-in patterns,redaction.toml, retroactive cleanupdocs/hooks.md— out-of-band ingestion hooksdocs/ollama-docker.md— sandboxed Ollamadocs/claude-md-snippet.md— agent-facing usage docs for CLAUDE.mddocs/skill-memory-maintenance.md— the pruning loop as candidate skill text for an agentdocs/releasing.md— publishing wheels to PyPI (trusted publishing)
Roadmap
The agent-memory market is crowded with cloud-first offerings (Mem0, Zep, Letta). Snowpack takes the opposite entry: a local-first core that syncs up when you want it to — local-first is the foundation later phases build on, not a stage to discard.
- Phase 1 — local dev tool (now). Everything in this repo: single SQLite file, CLI + hooks integration, telemetry from day one. Goal: prove retrieval quality and accumulate the usage data later tuning depends on.
- Phase 2 — local-first + sync. The SQLite file stays the on-device source of truth; optional sync to a hosted backend adds multi-device use, backup, and selective team sharing. The integration surface broadens beyond Claude Code: MCP server plus a language-agnostic SDK/HTTP API.
- Phase 3 — hosted platform. A managed, multi-tenant memory service covering all four memory types (episodic, semantic, working, procedural). Self-hosting stays a first-class path.
Full reasoning, the fixed-vs-provisional decision table, and migration risks
live in docs/adr/ADR-001-memory-architecture.md ("Phasing & evolution").
Fact extraction defaults to Anthropic's OpenAI-compatible endpoint and
needs an API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, or
SNOWPACK_EXTRACTION_API_KEY); override with
SNOWPACK_EXTRACTION_BASE_URL / _MODEL (localhost endpoints like
Ollama's /v1 need no key). Keys are read from the environment only and
never stored.
No API key at all? Extraction falls back automatically to Claude Code
itself (claude -p, headless) under your existing subscription login — no
key, no new model runtime — announcing the fallback when it happens
(SNOWPACK_EXTRACTION_PROVIDER=claude-code forces it;
=openai-compatible forbids it). It consumes your subscription usage, so
runs report tokens/cost and accept --token-budget / --cost-budget
stops alongside --limit; snowpack stats shows lifetime extraction
spend.
Development
uv sync
uv run pytest
uv run ruff check
Demo data
To try the full surface without real transcripts (and without touching
~/.snowpack), seed a sandboxed demo — synthetic transcripts for two fake
projects, pre-extracted facts (including a superseded pair), and probe
telemetry:
uv run python scripts/seed_demo.py # creates ~/.snowpack-demo
export SNOWPACK_DB=~/.snowpack-demo/snowpack.db
export SNOWPACK_CLAUDE_PROJECTS=~/.snowpack-demo/projects
snowpack probe "what did we decide about auth" --all-projects
snowpack stats
snowpack pit
It works without Ollama (probe degrades to keyword+recency, exactly as in real use); with Ollama running the same script embeds everything.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snowpack-0.1.11.tar.gz.
File metadata
- Download URL: snowpack-0.1.11.tar.gz
- Upload date:
- Size: 265.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81e7c8929b06858854588af3e466aa7877e156c1165f07ade78da1feb06c5689
|
|
| MD5 |
c97e55bce936aa19c859b197c2059307
|
|
| BLAKE2b-256 |
dc28762eb32919a579897247d72cec84a5b6bd70e8a96c7762ea5dd4f6eeb377
|
Provenance
The following attestation bundles were made for snowpack-0.1.11.tar.gz:
Publisher:
publish.yml on davidkelly-snoday/snowpack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
snowpack-0.1.11.tar.gz -
Subject digest:
81e7c8929b06858854588af3e466aa7877e156c1165f07ade78da1feb06c5689 - Sigstore transparency entry: 1798637291
- Sigstore integration time:
-
Permalink:
davidkelly-snoday/snowpack@57d1f98c6df4ad473393bf374b9e6926a537a268 -
Branch / Tag:
refs/tags/v0.1.11 - Owner: https://github.com/davidkelly-snoday
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@57d1f98c6df4ad473393bf374b9e6926a537a268 -
Trigger Event:
push
-
Statement type:
File details
Details for the file snowpack-0.1.11-py3-none-any.whl.
File metadata
- Download URL: snowpack-0.1.11-py3-none-any.whl
- Upload date:
- Size: 164.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ec567b76fbafaacfdbaafdef52af0dcd30ac7fe8617dce1e7e92bb7f07418e9
|
|
| MD5 |
ffadc8746b91df109b2d109c6962cab2
|
|
| BLAKE2b-256 |
86047634ceb69f03790df108fa0b170fc5ed8755dfed5901b0539260dbd6d604
|
Provenance
The following attestation bundles were made for snowpack-0.1.11-py3-none-any.whl:
Publisher:
publish.yml on davidkelly-snoday/snowpack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
snowpack-0.1.11-py3-none-any.whl -
Subject digest:
8ec567b76fbafaacfdbaafdef52af0dcd30ac7fe8617dce1e7e92bb7f07418e9 - Sigstore transparency entry: 1798637485
- Sigstore integration time:
-
Permalink:
davidkelly-snoday/snowpack@57d1f98c6df4ad473393bf374b9e6926a537a268 -
Branch / Tag:
refs/tags/v0.1.11 - Owner: https://github.com/davidkelly-snoday
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@57d1f98c6df4ad473393bf374b9e6926a537a268 -
Trigger Event:
push
-
Statement type: