Your local AI brain: persistent memory + full observability for any model. Data never leaves your machine.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

🧠 Memstash

Persistent memory + full cost/observability for any LLM — local-first, in one SQLite file. No server, no account, no telemetry. Your data never leaves your machine.

Python Local-first

pipx install 'memstash[all]'      # or: pip install 'memstash[openai]'

LLM apps have two chronic problems:

They forget you. Switch models or start a new session and you re-explain everything.
You can't see what they're doing. Which model? How many tokens? How much did that cost?

Memstash fixes both, locally, for any model — in a single SQLite file that holds your memories and every call's tokens/cost/latency. Switch GPT → Claude → DeepSeek → Qwen and your memory and your bill follow you. No server to run, no account to create, nothing leaves your laptop.

Why Memstash

Memory tools (mem0, Letta, Zep) don't track cost; observability tools (Langfuse, Helicone, Phoenix) don't do memory; and the serious ones all want a server or a database. Memstash is the one tool that does memory + observability + evals in a single local file — pip install, done.

	memory	cost/obs	evals	local-first, no server	install
Memstash	✅	✅	✅	✅	`pip install memstash`
mem0	✅	❌	❌	⚠️ lib-only; graph/prod needs a DB	`pip` + (cloud)
Langfuse	❌	✅	✅	❌ Postgres + ClickHouse	Docker stack
Zep / Graphiti	✅	❌	❌	❌ needs a graph DB	Docker + graph DB
simonw/llm	❌	logs only	❌	✅	`pip`

What you get

🧠 Persistent memory across sessions and across models
🔎 Hybrid retrieval — semantic + BM25 keyword search, fused (no extra deps)
🤖 Auto-memory — captures your preferences from conversation (EN + 中文)
♻️ Conflict resolution (opt-in) — ADD/UPDATE/DELETE/NOOP so facts stay current
⏳ Lifecycle + bi-temporal — soft-forget, recency ranking, "what did I know on date X?"
🕸️ Graph-lite — entity relationships in SQLite (no graph DB)
📄 Document ingestion — drop in .md/.txt/.pdf, it's searchable
🌊 Streaming + interactive REPL — memstash chat, multi-turn
📊 Observability — tokens/cost/latency per call, per-turn trace trees, evals
📡 Capture other apps — sink LangChain/OpenAI-SDK calls via memstash.instrument()
💰 Daily budget — warnings + optional hard-stop
🔌 22 providers — cloud, Chinese clouds, fast-inference hosts, local
🔗 MCP server — any agent (Claude Desktop/Code, Cursor) reads/writes your memory
🏠 100% local — no accounts, no servers, no telemetry

┌──────────────┐     ┌─────────────────────────────┐
│  any model   │     │  Memstash (local SQLite)       │
│  GPT / Claude│ ◄──►│  • memories  → auto-injected │
│  DeepSeek/Qwen│    │  • traces    → tokens & cost │
└──────────────┘     └─────────────────────────────┘
        nothing leaves your machine

Quickstart (30 seconds)

# Recommended: isolated CLI install with everything wired up
pipx install 'memstash[all]'        # or: uv tool install 'memstash[all]'

# Or pick what you need (base = keyword memory + tracing, no heavy deps):
pip install 'memstash[openai]'      # GPT / DeepSeek / Qwen / OpenAI-compatible
#   pip install 'memstash[anthropic]'   # Claude
#   pip install 'memstash[gemini]'      # Gemini
#   pip install 'memstash[embeddings]'  # semantic memory search (downloads a model)
#   pip install 'memstash[dashboard]'   # web dashboard
#   pip install 'memstash[mcp]'         # MCP server
#   pip install 'memstash[otel]'        # OpenTelemetry export
#   pip install memstash                # base only (keyword + tracing)

# not on PyPI yet? install straight from source:
#   pipx install 'git+https://github.com/zionLyl/recall.git#egg=memstash[all]'

# 0. (optional) guided setup: pick a default model, detect API keys
memstash init

# 1. Teach it about you (once)
memstash add "I prefer concise answers with tables" --tags style
memstash add "I do A-share & HK quant research"     --tags work

# 2. Chat with ANY model — it already knows you, and the call is traced
export OPENAI_API_KEY=sk-...
memstash chat openai gpt-4o-mini "How should you reply to me?"
#   ↑ also auto-captures new preferences you mention

# with defaults configured, just:
memstash chat "what do I work on?"

# or drop into an interactive, multi-turn chat (memory + tracing on):
memstash chat

# 3. See exactly what you spent (and your budget)
memstash stats

memstash stats

Memories stored : 2
Model calls     : 1
Tokens          : 312 in / 88 out
Total cost      : $0.0001
Avg latency     : 740 ms

Switch model, same memory, same ledger:

export ANTHROPIC_API_KEY=sk-...
memstash chat anthropic claude-3-5-sonnet "Remind me what I work on"
# → still remembers your A-share / HK quant work

Use as a library

from memstash import Memstash

r = Memstash()
r.remember("I prefer concise answers", tags=["style"])

out = r.chat("openai", "gpt-4o-mini", "How should you reply to me?")
print(out.text)          # the model already knows your preference

print(r.stats())         # {'calls': 1, 'cost_usd': ..., ...}

Web dashboard

pip install 'memstash[dashboard]'
memstash dashboard          # → http://127.0.0.1:8745

A single local page: memory cards, cost-by-model, recent calls. No build step, no telemetry, no cloud.

Streaming

Replies stream by default — you see tokens as the model produces them, then the usual cost/latency footer.

memstash chat "draft a haiku about memory"   # streams token-by-token
memstash chat                                # interactive multi-turn REPL
memstash chat --no-stream "..."              # wait for the full reply instead
memstash config set stream false             # make non-streaming the default

In the REPL each turn keeps the in-session conversation history and your long-term memories are injected — type /exit or Ctrl-D to leave.

From the library, pass an on_token callback; you still get the full outcome:

out = r.stream("openai", "gpt-4o-mini", "tell me a joke",
               on_token=lambda t: print(t, end="", flush=True))
print(out.cost_usd, out.output_tokens)     # full accounting after streaming

Smarter memory extraction (opt-in)

By default memstash captures memories with fast, free heuristics (regex cues, EN + 中文). Flip on LLM extraction to have a model read each message and pull durable first-person facts — higher recall, at the cost of one extra (cheap) call that's also traced toward your budget.

memstash config set extraction_mode llm          # heuristic (default) | llm
memstash config set extraction_model gpt-4o-mini # optional; defaults to chat model

If the extraction call ever fails (no key, network, bad output) memstash silently falls back to the heuristic extractor, so chat never breaks.

Curate your memory

memstash edit 3 "I prefer concise answers with tables"   # rewrite a memory
memstash edit 3 --tags style,format                       # or just retag it

# Merge near-duplicates that pile up from auto-capture (needs embeddings)
memstash dedupe --dry-run        # preview which memories would merge
memstash dedupe --threshold 0.9  # keep the earliest, union tags, drop the rest
memstash config set dedupe_similarity 0.95   # also suppress near-dupes on add

Editing re-embeds the memory so semantic search stays accurate. Dedupe groups memories whose embeddings are ≥ the threshold, keeps the earliest as canonical, and unions tags onto it — exact-duplicate skipping still works even without embeddings installed.

Semantic search without the model download

By default, semantic search uses a local sentence-transformers model (pip install 'memstash[embeddings]', ~80MB on first use). If you'd rather not pull in PyTorch, point memstash at any OpenAI-compatible /embeddings endpoint — e.g. a local Ollama or LM Studio you already run:

memstash config set embedding_backend api
memstash config set embedding_base_url http://localhost:11434/v1   # Ollama
memstash config set embedding_model nomic-embed-text
# cloud endpoints: also set embedding_api_key_env to the env var holding the key

Now memstash add / memstash search get semantic embeddings over HTTP — no heavy local dependency. If the endpoint is unreachable, memstash transparently falls back to keyword/BM25 search.

MCP server — plug memstash into any agent

Expose your local memory to any MCP-aware client (Claude Desktop, Claude Code, Cursor, …) so the agent can read and write the same brain you use from the CLI.

pip install 'memstash[mcp]'
memstash mcp        # runs an MCP server over stdio

Wire it into your MCP client config:

{
  "mcpServers": {
    "memstash": { "command": "memstash", "args": ["mcp"] }
  }
}

Tools exposed: remember, recall_search, list_memories, forget, usage_stats. Same local SQLite store — nothing leaves your machine.

Capture your existing app's LLM calls

Already using LangChain, LlamaIndex, or the OpenAI SDK? Make memstash a local sink for their calls — no server, no cloud (the local counterpart to Phoenix/Langfuse auto-instrumentation):

import memstash
memstash.instrument()                     # spans now land in ~/.memstash/memstash.db

from openinference.instrumentation.openai import OpenAIInstrumentor
OpenAIInstrumentor().instrument()       # (memstash auto-enables this if installed)
# ...your normal OpenAI/LangChain code now shows up in `memstash recent` / `stats`.

Needs pip install 'memstash[otel]' plus whichever OpenInference instrumentor you use. Captured calls are tagged kind="instrumented".

Supported models (22 providers)

Mix and match across clouds, Chinese providers, fast inference hosts, and local models — your memory and cost ledger follow you everywhere.

Provider	`provider` arg	Example models	API key env
OpenAI	`openai`	`gpt-4o`, `gpt-4o-mini`, `gpt-4.1`	`OPENAI_API_KEY`
Anthropic	`anthropic`	`claude-3-5-sonnet`, `claude-3-5-haiku`	`ANTHROPIC_API_KEY`
Google Gemini	`gemini`	`gemini-1.5-pro`, `gemini-2.0-flash`	`GEMINI_API_KEY`
DeepSeek	`deepseek`	`deepseek-chat`, `deepseek-reasoner`	`DEEPSEEK_API_KEY`
Qwen (DashScope)	`qwen`	`qwen-plus`, `qwen-max`	`DASHSCOPE_API_KEY`
Moonshot (Kimi)	`moonshot`	`moonshot-v1-8k`, `moonshot-v1-32k`	`MOONSHOT_API_KEY`
Zhipu (GLM)	`zhipu`	`glm-4`, `glm-4-flash`	`ZHIPU_API_KEY`
MiniMax	`minimax`	`abab6.5s`	`MINIMAX_API_KEY`
Baichuan	`baichuan`	`Baichuan4`	`BAICHUAN_API_KEY`
01.AI (Yi)	`yi`	`yi-large`, `yi-lightning`	`YI_API_KEY`
StepFun	`stepfun`	`step-1`	`STEPFUN_API_KEY`
Mistral	`mistral`	`mistral-large`, `mistral-small`	`MISTRAL_API_KEY`
xAI (Grok)	`xai`	`grok-2`, `grok-beta`	`XAI_API_KEY`
Groq	`groq`	`llama-3.3-70b-versatile`	`GROQ_API_KEY`
Together	`together`	open models	`TOGETHER_API_KEY`
Fireworks	`fireworks`	open models	`FIREWORKS_API_KEY`
DeepInfra	`deepinfra`	open models	`DEEPINFRA_API_KEY`
Perplexity	`perplexity`	`sonar`, `sonar-pro`	`PERPLEXITY_API_KEY`
OpenRouter	`openrouter`	400+ models, one key	`OPENROUTER_API_KEY`
Ollama (local)	`ollama`	`llama3`, `qwen2.5`	—
LM Studio (local)	`lmstudio`	any loaded model	—
Any OpenAI-compatible	`openai-compatible`	set `--base-url`	`MEMSTASH_API_KEY`

memstash models   # list all providers + key env vars + base URLs

Most providers speak the OpenAI API, so they share one adapter — just point at the right base URL (handled automatically). Gemini has its own native adapter. Local models (Ollama / LM Studio) need no key and no cloud.

Scopes, budget & config

# Isolate memory per project
memstash scope work            # switch active scope
memstash add "deadline Friday" # stored in 'work'
memstash scope                 # list all scopes
memstash list --all            # see every scope

# Set a daily spend cap (warns at 80% and 100%)
memstash config set daily_budget_usd 1.0
memstash config set budget_enforce true   # hard-stop: refuse calls once the cap is hit

# Defaults so you can just `memstash chat "..."`
memstash config set default_provider deepseek
memstash config set default_model deepseek-chat
memstash config show

# Backup / move your brain
memstash export my-brain.json
memstash import my-brain.json

CLI reference

Command	What it does
`memstash init`	Guided first-time setup
`memstash doctor`	Show which providers have keys
`memstash add "..." [--tags a,b] [--scope s]`	Store a memory
`memstash ingest <file.md/.txt/.pdf>`	Ingest a document into searchable memory
`memstash search "..." [--all]`	Semantic (or keyword) search
`memstash list [--all] [--at WHEN]`	List memories (active, or valid as-of a past time)
`memstash show <id>`	Inspect a memory + its provenance (source chat)
`memstash edit <id> ["new content"] [--tags ...]`	Edit a memory in place
`memstash forget <id> [--soft]`	Delete (or soft-forget) a memory
`memstash prune [--older-than DAYS] [--unused] [--all]`	Soft-forget stale memories
`memstash dedupe [--threshold 0.9] [--all] [--dry-run]`	Merge near-duplicate memories
`memstash graph [entity] [--add "s\|p\|o"]`	View / add entity relationships
`memstash scope [name]`	Switch / list scopes
`memstash chat [provider model] "..." [-T tmpl -V k=v] [--no-stream]`	Chat with memory + tracing + auto-memory
`memstash chat`	Interactive multi-turn chat (REPL)
`memstash stats`	Tokens, cost & budget overview
`memstash recent`	Recent model calls (with trace IDs)
`memstash trace`	Recent turns as call trees
`memstash eval <id> [--contains/--regex/--judge/--suite ...]`	Score a traced reply (rules / LLM judge)
`memstash evals [--trace id]`	List eval results
`memstash eval-suite save/list/rm`	Manage reusable eval suites
`memstash pricing [model]`	Show resolved per-1M-token pricing
`memstash benchmark`	Reproducible retrieval/extraction quality numbers
`memstash models`	Supported providers
`memstash prompt save/list/show/use/rm`	Manage prompt templates
`memstash export/import <file>`	Backup / restore memories
`memstash config show/set/path`	View & edit configuration
`memstash dashboard`	Launch local web UI
`memstash mcp`	Run as an MCP server (stdio) for any agent

Where is my data?

A single SQLite file at ~/.memstash/recall.db (override with MEMSTASH_HOME). That's it. No accounts, no servers, no telemetry. Back it up, sync it, delete it — it's yours.

Why local-first?

Privacy — your memories and prompts stay on your disk.
Portability — one file you can move, version, or sync yourself.
No lock-in — works across providers; swap models freely.

Benchmark

memstash ships a reproducible, key-free quality benchmark:

memstash benchmark

It seeds a fixed, hand-labeled memory set and measures retrieval quality (recall@1, recall@k, precision@k, MRR) plus heuristic-extraction fact-memstash — honestly labeling whether it ran in semantic or keyword/BM25 mode. Keyword baseline: recall@1 ≈ 0.50, MRR ≈ 0.69, extraction fact-memstash 1.00 with 0 false captures; installing [embeddings] (or an api backend) scores higher. Numbers are deterministic, so you can track them across changes.

Roadmap

See ROADMAP.md for how memstash compares to mem0 / Letta / Zep / Langfuse / LiteLLM / simonw's llm, what it does better, and what's planned next.

Auto-extract memories from conversations
Budget alerts ("you've spent $X today")
Gemini + local Ollama / LM Studio adapters
Export / import memories
Memory scopes
Streaming chat output
LLM-based memory extraction (opt-in, higher recall)
MCP server so any agent can read/write memstash memory
PyPI release (pip install memstash) — automated via tag push
Memory editing & merge / dedupe by similarity

Contributing

Issues and PRs welcome. Run tests with:

pip install 'memstash[dev]'
pytest

License

MIT © zionLyl

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zionfly

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.15.1

Jun 1, 2026

This version

0.15.0

Jun 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memstash-0.15.0.tar.gz (85.1 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

memstash-0.15.0-py3-none-any.whl (70.2 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file memstash-0.15.0.tar.gz.

File metadata

Download URL: memstash-0.15.0.tar.gz
Upload date: Jun 1, 2026
Size: 85.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memstash-0.15.0.tar.gz
Algorithm	Hash digest
SHA256	`7d18dbea0e3c2b056f82888e855645a721554c7c1b949cf15aedaa704bb77535`
MD5	`1818a27d455abfb71701f596770e8791`
BLAKE2b-256	`50ed840914d0d39a0629e0ecc54ca7628688f2fe58b6a2c59a9f625e2641ba9d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for memstash-0.15.0.tar.gz:

Publisher: publish.yml on zionLyl/recall

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: memstash-0.15.0.tar.gz
- Subject digest: 7d18dbea0e3c2b056f82888e855645a721554c7c1b949cf15aedaa704bb77535
- Sigstore transparency entry: 1688666581
- Sigstore integration time: Jun 1, 2026
Source repository:
- Permalink: zionLyl/recall@ac760ffde03cd5326164e3f1180917b8f0565a54
- Branch / Tag: refs/tags/v0.15.0
- Owner: https://github.com/zionLyl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ac760ffde03cd5326164e3f1180917b8f0565a54
- Trigger Event: push

File details

Details for the file memstash-0.15.0-py3-none-any.whl.

File metadata

Download URL: memstash-0.15.0-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 70.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memstash-0.15.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d22da3dca2cd17e33aafd7cbd50c84f2ecf14820298649180f1b964e824d388`
MD5	`d5ab5f38e4695c1e5c22ea29e44da23a`
BLAKE2b-256	`04cbdef370a9a90f31926073dedfb3b0755921b5453efcec5f42812c47430639`

See more details on using hashes here.

Provenance

The following attestation bundles were made for memstash-0.15.0-py3-none-any.whl:

Publisher: publish.yml on zionLyl/recall

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: memstash-0.15.0-py3-none-any.whl
- Subject digest: 1d22da3dca2cd17e33aafd7cbd50c84f2ecf14820298649180f1b964e824d388
- Sigstore transparency entry: 1688666641
- Sigstore integration time: Jun 1, 2026
Source repository:
- Permalink: zionLyl/recall@ac760ffde03cd5326164e3f1180917b8f0565a54
- Branch / Tag: refs/tags/v0.15.0
- Owner: https://github.com/zionLyl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ac760ffde03cd5326164e3f1180917b8f0565a54
- Trigger Event: push

memstash 0.15.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🧠 Memstash

Why Memstash

What you get

Quickstart (30 seconds)

Use as a library

Web dashboard

Streaming

Smarter memory extraction (opt-in)

Curate your memory

Semantic search without the model download

MCP server — plug memstash into any agent

Capture your existing app's LLM calls

Supported models (22 providers)

Scopes, budget & config

CLI reference

Where is my data?

Why local-first?

Benchmark

Roadmap

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance