Semantic memory server for AI agent teams
Project description
Annal
A tool built by tools, for tools.
Early stage — this project is under active development and not yet ready for production use. APIs, config formats, and storage schemas may change without notice. If you're curious, feel free to explore and open issues, but expect rough edges.
Semantic memory server for AI agent teams. Stores, searches, and retrieves knowledge across sessions using pluggable vector backends (ChromaDB or Qdrant) with local ONNX embeddings, exposed as an MCP server.
Designed for multi-agent workflows where analysts, architects, developers, and reviewers need shared institutional memory — decisions made months ago surface automatically when relevant, preventing contradictions and preserving context that no single session can hold.
How it works
Annal runs as a persistent MCP server (stdio or HTTP) and provides tools for storing, searching, updating, and managing memories. Memories are embedded locally using all-MiniLM-L6-v2 (ONNX) and stored in a vector backend (ChromaDB by default, Qdrant optional), namespaced per project. When using Qdrant, hybrid search combines dense vector similarity with BM25 keyword matching via reciprocal rank fusion for better recall.
File indexing is optional. Point Annal at directories to watch and it will chunk markdown files by heading, track modification times for incremental re-indexing, and keep the store current via watchdog filesystem events. For large repos, file watching can be disabled per-project — agents trigger re-indexing on demand via index_files.
Indexing is non-blocking. init_project and index_files return immediately while reconciliation runs in the background. Agents poll index_status to track progress, which shows elapsed time and chunk counts.
Agent memories and file-indexed content coexist in the same search space but are distinguished by tags (memory, decision, pattern, bug, indexed, etc.), so agents can search everything or filter to just what they need.
A web dashboard (HTMX + Jinja2) runs alongside the server, providing a browser-based view of memories with search, browsing, bulk delete, and live SSE updates when memories are stored or indexing is in progress.
Quick start
pip install annal
# One-shot setup: creates service, configures MCP clients, starts the daemon
annal install
Or from source:
git clone https://github.com/heyhayes/annal.git
cd annal
pip install -e ".[dev]"
# Run in stdio mode (single session)
annal
# Run as HTTP daemon (shared across sessions)
annal --transport streamable-http
annal install detects your OS and sets up the appropriate service (systemd on Linux, launchd on macOS, scheduled task on Windows). It also writes MCP client configs for Claude Code, Codex, and Gemini CLI.
MCP client integration
Claude Code
Add to ~/.mcp.json for stdio mode:
{
"mcpServers": {
"annal": {
"command": "/path/to/annal/.venv/bin/annal"
}
}
}
For HTTP daemon mode (recommended when running multiple concurrent sessions):
{
"mcpServers": {
"annal": {
"type": "http",
"url": "http://localhost:9200/mcp"
}
}
}
Codex / Gemini CLI
annal install writes the appropriate config files automatically. See annal install output for paths.
Agent configuration
For agents to actually use Annal, they need instructions that explain why it matters, not just how to call it. Add one of these snippets to your CLAUDE.md, AGENT.md, or equivalent agent instructions file.
Recommended snippet
<annal_semantic_memory>
You have persistent semantic memory via Annal (mcp__annal__* tools). Memories survive across
sessions and are searchable by meaning. This is your long-term memory — MEMORY.md is a cheat
sheet, Annal is deep storage.
Why this matters: every session starts blank. Without Annal, you repeat investigations,
rediscover patterns, and miss prior decisions. With it, you inherit your past self's
understanding of the codebase.
When to search (use mode="probe" to scan, then expand_memories for details):
- Session start: load context for the current task area
- Unfamiliar code: before diving into a module you haven't seen this session
- "What happened" questions: anything about recent work, prior decisions, project state
- Before architectural changes: check for prior decisions in the same domain
- Familiar-feeling bugs: search for prior root causes
When to store (tag with type + domain, e.g. tags=["decision", "auth"]):
- Bug root causes and the fix that worked
- Architectural decisions and their rationale
- Codebase patterns that took effort to discover
- User preferences for workflow, tools, style
- Key file paths and module responsibilities in unfamiliar codebases
After completing a task, before moving on, always ask: what did I learn that I'd want to know
next time? If you discovered a root cause, mapped unfamiliar architecture, or found a pattern
that took effort — store it. This is the single most important habit for cross-session value.
Project name: use the basename of the current working directory.
</annal_semantic_memory>
Minimal snippet
If you prefer something shorter:
<annal_semantic_memory>
You have persistent semantic memory via Annal (mcp__annal__* tools). Unlike MEMORY.md which
resets with context, Annal memories survive across sessions and are searchable by meaning.
This matters because you lose all context when a session ends. Annal is how you recover it.
Search before starting work — your past self may have already mapped the architecture,
debugged this module, or recorded a decision that saves you from repeating the investigation.
Search: at session start, when touching unfamiliar code, when the user asks "what did we
decide about X", and before proposing architectural changes. Use mode="probe" to scan cheaply.
Store: bug root causes, architectural decisions, codebase patterns, surprising discoveries —
anything you'd want to know if you started a fresh session tomorrow. Tag with a type
(decision, bug, pattern, memory) plus domain tags. After completing a task, always ask: what
did I learn? Store it before moving on.
Project name: use the basename of the current working directory.
</annal_semantic_memory>
Project setup
On first use, call init_project with watch paths for file indexing, or just start storing memories — unknown projects are auto-registered in the config.
init_project(project_name="myapp", watch_paths=["/home/user/projects/myapp"])
Every tool takes a project parameter. Use the directory name of the codebase you're working in (e.g. "myapp", "annal").
Tools
store_memory — Store knowledge with tags and source attribution. Near-duplicates (>95% similarity) are automatically skipped. When a similar memory is found (80-95% similarity), a hint suggests using the supersedes parameter to replace it. Pass supersedes=<old_id> to mark the old memory as replaced — it drops out of search but remains for audit.
store_batch — Store multiple memories in a single call. Each item takes the same fields as store_memory (content, tags, source, supersedes). More efficient than repeated store_memory when storing 2+ memories at once.
search_memories — Natural language search with optional tag filtering and similarity scores. Three output modes: mode="summary" (default) returns first 200 chars with metadata, mode="probe" returns compact one-line summaries for scanning large result sets, and mode="full" returns complete content. Optional min_score filter suppresses low-relevance noise. Tags use fuzzy matching (semantic similarity) so tags=["auth"] finds memories tagged authentication. Temporal filtering with after and before (ISO 8601 dates) scopes results by creation date. Optional projects parameter enables cross-project search (projects="*" searches all configured projects). Pass include_superseded=True to surface replaced memories. Returned results include hit_count and last_accessed_at for access tracking.
expand_memories — Retrieve full content for specific memory IDs. Use after a probe search to fetch details for relevant results.
update_memory — Revise content, tags, or source on an existing memory without losing its ID or creation timestamp. Tracks updated_at alongside the original.
retag_memory — Modify tags on an existing memory without changing content. Supports add_tags/remove_tags for incremental edits, or set_tags to replace all tags at once.
delete_memory — Remove a specific memory by ID.
prune_stale — Review and delete stale agent memories. Identifies memories with last_accessed_at older than max_age_days (default 60) and optionally those never accessed. Runs in dry_run=True mode by default, returning a summary of what would be deleted. Set dry_run=False to execute deletion. Only targets agent memories — file-indexed chunks are managed by the file watcher.
list_topics — Show all tags and their frequency counts.
init_project — Register a project with watch paths, patterns, and exclusions for file indexing. Indexing starts in the background and returns immediately.
index_files — Full re-index: clears all file-indexed chunks and re-indexes from scratch. Use after changing exclude patterns to remove stale chunks.
index_status — Per-project diagnostics: total chunks, file-indexed vs agent memory counts, stale and never-accessed memory counts, indexing state with elapsed time, and last reconcile timestamp.
Configuration
~/.annal/config.yaml:
data_dir: ~/.annal/data
port: 9200
projects:
myapp:
watch_paths:
- /home/user/projects/myapp
watch_patterns:
- "**/*.md"
- "**/*.yaml"
- "**/*.toml"
- "**/*.json"
watch_exclude:
- "**/node_modules/**"
- "**/vendor/**"
- "**/.git/**"
- "**/.venv/**"
- "**/__pycache__/**"
- "**/dist/**"
- "**/build/**"
large-repo:
watch: false # disable file watching, use index_files on demand
watch_paths:
- /home/user/projects/large-repo
Backend configuration
By default, Annal uses ChromaDB (local, file-based, no extra dependencies). To switch to Qdrant for native tag filtering, hybrid BM25+vector search, and concurrent write support, add a storage section:
storage:
backend: qdrant
backends:
qdrant:
url: http://localhost:6333
hybrid: true # enable BM25 sparse vectors (default: true)
chromadb:
path: ~/.annal/data # kept for migration
Install the Qdrant client dependency: pip install annal[qdrant]
To migrate existing data between backends: annal migrate --from chromadb --to qdrant --project myapp
Export / Import
Back up and restore memories as JSONL:
annal export --project myapp > backup.jsonl
annal import --project myapp backup.jsonl
Running as a daemon
The recommended approach is annal install, which sets up the service for your OS automatically.
For manual setup, use the service scripts in contrib/:
Linux (systemd)
cp contrib/annal.service ~/.config/systemd/user/
# Edit ExecStart path, then:
systemctl --user daemon-reload
systemctl --user enable --now annal
macOS (launchd)
cp contrib/com.annal.server.plist ~/Library/LaunchAgents/
# Edit the ProgramArguments path, then:
launchctl load ~/Library/LaunchAgents/com.annal.server.plist
Windows (scheduled task)
.\contrib\annal-service.ps1 -Action install -AnnalPath "C:\path\to\annal\.venv\Scripts\annal.exe"
Start-ScheduledTask -TaskName "Annal MCP Server"
Dashboard
When running as an HTTP daemon, the dashboard is available at http://localhost:9200. It provides memory browsing with pagination and filters (by type, source, tags), semantic search with cross-project support, expandable content previews, bulk delete by selection or filter, a "Show superseded" toggle for viewing replaced memories, a "Show stale only" filter that surfaces memories that haven't been accessed in 60+ days or were never accessed, and live SSE updates when memories are stored, deleted, or indexing is in progress. Clickable tag pills in the table jump to filtered views. The project overview table shows stale memory counts per project with links to the filtered view.
Disable with --no-dashboard if not needed.
Roadmap
0.1.0 — Foundation (shipped)
Core memory store, semantic search, file indexing, MCP server, web dashboard, one-shot install.
0.2.0 — Operational Readiness (shipped)
Async indexing, thread safety, index_status diagnostics, mtime cache performance, optional file watching.
0.3.0 — Search & Retrieval (shipped)
Temporal filtering, structured JSON output, heading context in embeddings.
0.4.0 — Bug Sweep + Features (shipped)
Six bug fixes (date filter, dual config, startup lock, pool lock safety, browse pagination, config I/O under lock). Fuzzy tag matching via ONNX embeddings. Cross-project search with fan-out and score-based merge.
0.5.0 — Stress-Test Bug Sweep (shipped)
Seven fixes from stress testing: min_score no longer masks fuzzy tag matches, cross-project search always includes primary project, empty parent heading chunks skipped, invalid dates raise errors instead of silently returning empty, dedup checks all agent-memory candidates, daemon threads joined on shutdown, fuzzy tag threshold lowered to 0.72.
0.6.0 — Vector Backend Abstraction + Qdrant (shipped)
VectorBackend protocol with pluggable backends. ChromaDB extracted behind protocol. QdrantBackend with native tag filtering, hybrid BM25+vector search via RRF, deterministic UUID mapping. Config-driven backend selection. Migration CLI (annal migrate).
0.6.1 — Retag + Dashboard UX (shipped)
retag_memory tool for incremental tag editing. Dashboard improvements: project overview table, clickable tag pills, cross-project search. Search default mode changed from full to summary.
0.6.2 — Hardening + Export/Import (shipped)
Export/import CLI (annal export, annal import) for JSONL-based backup and restore. Bug fixes for dedup, tag normalization, and startup reconciliation. Backend conformance improvements.
0.6.3 — Memory Supersession (shipped)
supersedes parameter on store_memory marks old memories as replaced. Superseded memories hidden from search/browse by default, visible with include_superseded=True. $not_exists post-filter operator for both backends. Similarity hints (0.80-0.95) suggest supersession to agents. Dashboard "Show superseded" toggle. Backend conformance test suite extracted into parametrized shared tests.
0.7.0 — Search Improvements & Stale Memory Management (shipped)
store_batch tool for efficient multi-memory storage. Hit tracking on agent memories — search_memories, expand_memories, and get_by_ids record hit_count and last_accessed_at. Overfetch cap on tag-filtered searches to bound post-filter work. prune_stale tool for reviewing and deleting memories that haven't been accessed in a configurable number of days (dry-run by default). index_status now reports stale and never-accessed memory counts. Dashboard stale column on project overview, "Show stale only" filter on the memories page, and stale/never-accessed badges on memory rows. GitHub Actions CI and PyPI publish workflows.
Future
Proactive context injection. Memory relationships beyond supersession.
Development
pip install -e ".[dev]"
pytest -v
266 tests cover store operations, search, hit tracking, stale detection, supersession, batch storage, indexing, file watching, dashboard routes, SSE events, CLI installation, export/import, migration, and a shared backend conformance suite that runs against both ChromaDB and Qdrant.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file annal-0.7.1.tar.gz.
File metadata
- Download URL: annal-0.7.1.tar.gz
- Upload date:
- Size: 184.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c952c492edc044055a645135bba5e157ad0349eab8e329da0747a5b71c539fb
|
|
| MD5 |
2ed2195b87b55a1724d2be341ab779cd
|
|
| BLAKE2b-256 |
7e2339eb90ded44b9f8648ee32d93fea31313adb49880cff241147a13d83fd5e
|
Provenance
The following attestation bundles were made for annal-0.7.1.tar.gz:
Publisher:
publish.yml on heyhayes/annal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
annal-0.7.1.tar.gz -
Subject digest:
3c952c492edc044055a645135bba5e157ad0349eab8e329da0747a5b71c539fb - Sigstore transparency entry: 983764731
- Sigstore integration time:
-
Permalink:
heyhayes/annal@18ab82ccfd830a27f1c73470c45ed4b9adabe6a6 -
Branch / Tag:
refs/tags/v0.7.1 - Owner: https://github.com/heyhayes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@18ab82ccfd830a27f1c73470c45ed4b9adabe6a6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file annal-0.7.1-py3-none-any.whl.
File metadata
- Download URL: annal-0.7.1-py3-none-any.whl
- Upload date:
- Size: 58.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14274cce9566fbc69e9efb74e29257da8e57d2d483f6083d8a5a99a6f535d72c
|
|
| MD5 |
9456df397bad9791a53e0c8483298fca
|
|
| BLAKE2b-256 |
f6eb9f7ecd2d7b659f28ed53dc230d384025799dc6bd9a6fe5dc0531dcbbab93
|
Provenance
The following attestation bundles were made for annal-0.7.1-py3-none-any.whl:
Publisher:
publish.yml on heyhayes/annal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
annal-0.7.1-py3-none-any.whl -
Subject digest:
14274cce9566fbc69e9efb74e29257da8e57d2d483f6083d8a5a99a6f535d72c - Sigstore transparency entry: 983764735
- Sigstore integration time:
-
Permalink:
heyhayes/annal@18ab82ccfd830a27f1c73470c45ed4b9adabe6a6 -
Branch / Tag:
refs/tags/v0.7.1 - Owner: https://github.com/heyhayes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@18ab82ccfd830a27f1c73470c45ed4b9adabe6a6 -
Trigger Event:
push
-
Statement type: