Better contextual retrieval for AI agents. Three-path RRF retrieval in SQLite.
Project description
recall. 🧠
Better contextual retrieval for AI agents. Three-path RRF retrieval (ANN + keyword SQL JOIN + FTS5) in pure SQLite. No LLM at query time. ~80ms latency. 1400 real memories.
from recall import retrieve_relevant
store.add("User prefers docker-compose over Dockerfile")
results = retrieve_relevant("How should I deploy?", store)
# → "User prefers docker-compose over Dockerfile"
Prerequisites: Embedding Model
recall. uses nomic-embed-text-v1.5 (768-dim) running in LM Studio. No LLM — embedding models are tiny (~150MB), fast, and cost zero tokens.
1. Install LM Studio
Download from lmstudio.ai.
2. Load the embedding model
| Step | Screenshot / Cmd |
|---|---|
| Open LM Studio → Models tab | — |
Search nomic-embed-text-v1.5 → Download |
~150MB |
| Switch to Local Inference Server tab | — |
Select nomic-embed-text-v1.5 in the model dropdown |
— |
| Click Start Server | port defaults to 1234 |
| Verify it's working: | curl http://127.0.0.1:1234/v1/models |
Expected response:
{"object":"list","data":[{"id":"nomic-embed-text-v1.5","object":"model",...}]}
That's it. No API keys, no cloud services, no GPU required beyond what LM Studio needs (~2GB VRAM, also runs on CPU).
Port configuration
Default port is 1234. To change it, set EMBED_PORT in src/recall/embed.py:
EMBED_PORT = 1234 # change to match your LM Studio port
If LM Studio is down
Graceful degradation kicks in automatically:
- retrieval (
mcp_recall_recall/recall query) falls back to keyword + FTS5 search — no crash, just no ANN path - storage (
mcp_recall_store_memory/recall add) saves memories without embeddings — still findable via keywords - CLI (
recall add/stats/delete) unaffected — doesn't use embeddings at all
No error, no crash, no data loss. Just slightly less precise results.
Quick start
pip install numpy
pip install -e .
recall add "User prefers docker-compose for local dev"
recall query "How to deploy?"
Or via MCP server for Hermes Agent / Antigravity IDE / Gemini CLI:
Hermes (local install)
If recall is installed in the same Python env as Hermes:
# ~/.hermes/config.yaml
mcp_servers:
recall:
command: "python"
args: ["-m", "recall.recall_mcp"]
timeout: 30
cwd: "/path/to/recall-memory" # optional, needed for DB path resolution
Hermes (Docker)
# ~/.hermes/config.yaml
mcp_servers:
recall:
command: docker
args:
- run
- -i
- --rm
- --network=host
- -v
- recall-data:/data
- recall-memory:latest
timeout: 30
Build the image first:
cd /path/to/recall-memory
docker compose build
Architecture
store.py — SQLite backend + tier management
embed.py — Nomic Embed via LM Studio REST API (768-dim)
retrieve.py — Three-path RRF retrieval + tier router
cli.py — Typer CLI (add / query / stats / delete / gc)
recall_mcp.py — MCP server for agent integration
Tiered Storage (v0.2.0+)
Memories are split into three tiers to reduce compute and memory:
| Tier | Capacity | Retrieval | Compute Cost |
|---|---|---|---|
| Hot | ~500 | ANN + keywords + FTS5 (3-path RRF) | Highest |
| Warm | ~5000 | keywords + FTS5 only (2-path RRF) | Medium |
| Cold | Unlimited | Not indexed, fill-gap fallback only | ~Zero |
- Hot: full vectors in ANN index. Fastest search.
- Warm: keyword/FTS5 only, no vectors. 66–99% less ANN work.
- Cold: doesn't participate in normal queries. Only searched when hot+warm results are insufficient.
Promotion/demotion is automatic based on access frequency. Cold memories are sampled every N queries for keyword overlap—if relevant, they're promoted back to warm. No cron, no UI, no configuration needed.
Three parallel retrieval paths, fused via RRF (Reciprocal Rank Fusion):
Path V: Vector search (ANN) — sqlite-vec cosine similarity (hot tier only)
Path K: Keyword SQL JOIN — multi-hop keyword expansion (all tiers)
Path F: FTS5 full-text search — porter tokenizer + unicode61 (all tiers)
Tier router → hot 3-path → warm 2-path → cold fill-gap
No LLM calls at query time. No vector database. Just SQLite.
Installation
Dependencies
| Dependency | Required? | Notes |
|---|---|---|
| Python ≥3.10 | ✅ | — |
| numpy | ✅ | Cosine similarity + vector ops |
| typer | ✅ | CLI interface |
| sqlite-vec | ✅ | SQLite extension for ANN |
| LM Studio (port 1234) | ✅ | Runs nomic-embed-text-v1.5. See Prerequisites above. |
| pytest | ❌ | Only needed for development (pip install -e ".[dev]") |
| sentence-transformers | ❌ | Not used. The actual embedding calls go through LM Studio's HTTP API. |
pip install numpy
pip install -e . # installs recall-memory package + pulls sqlite-vec
Verify installation
recall stats
# → Memories: 0 Keywords: 0
CLI
recall add "content" # Store a memory
recall query "question" # Retrieve relevant memories (tiered)
recall query "question" --include-cold # Search cold tier too
recall stats # Store statistics
recall stats --verbose # + tier distribution
recall gc --dry-run # Preview eviction candidates
recall gc # Run garbage collection
recall delete <id> # Remove a memory
MCP Tools (Hermes / Antigravity / Gemini)
Three tools exposed via stdio MCP transport:
| Tool | Parameters | Returns |
|---|---|---|
recall |
query: str (required), k: int (default 5), include_cold: bool (default false) |
{memories: [...], count: int} |
store_memory |
content: str (required), session_id: str, tag: str |
{id: str, status: "stored"} |
memory_stats |
(none) | {memories: int, keywords: int, tiers: {hot, warm, cold}} |
gc_memory |
dry_run: bool (default false) |
{evicted/ candidates: int, db_size_mb: float} |
Status
Production-ready MVP with tiered storage (v0.2.0). Tested against AIngram (tied on 1400 memories × 40 queries).
Memories: 1400 (from Honcho)
Keywords: 10560
Latency: ~80ms/query (hot), ~60ms/query (warm fill-gap)
ANN scan: -66% (now) → -99% (at 50K memories)
Memory: ~1.5MB fixed for hot tier vs linear growth
Eval: recall@5 comparable to AIngram with full extractor
Upgrading
From v0.1.x to v0.2.0
pip install --upgrade recall-sqlite
Schema migration is automatic — SQLite ALTER TABLE runs on first start.
No manual steps needed. Your existing memories are preserved and will start
in the "hot" tier.
To verify:
recall stats --verbose
# Should show the same memory count with tier distribution
Rollback
pip install recall-sqlite==0.1.0
Design decisions
| Decision | Rationale |
|---|---|
| Three-path RRF | ANN + SQL JOIN + FTS5 covers different failure modes |
| No LLM re-rank | Extra latency + cost; not needed for retrieval quality |
| SQLite first | Zero-deployment, portable, git-committable |
| Nomic embed via LM Studio | 768-dim, better than MiniLM, no Python packaging hell |
| RRF fusion | No weight tuning needed; standard IR technique |
Comparison with AIngram
| System | R@5 (40 mems) | R@5 (1400 mems) | Latency |
|---|---|---|---|
| recall. | 0.579 | ~0.58 | ~80ms |
| AIngram | 0.583 | ~0.58 | ~27ms |
Both systems tied on identical embedding model. recall.'s advantage: three-path architecture (AIngram uses two-path when extractor is unavailable).
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recall_sqlite-0.2.0.tar.gz.
File metadata
- Download URL: recall_sqlite-0.2.0.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e657a988e9bb56d34fcbee29f9b290f7f1a2dfceafa140cf0ea2940810a1cece
|
|
| MD5 |
3c05edf5366e7433ec7456dcac31d398
|
|
| BLAKE2b-256 |
2ac9d614a7b438b908146b4ef451d200f00e897d211f8cd78b4b8698afd4bb8b
|
Provenance
The following attestation bundles were made for recall_sqlite-0.2.0.tar.gz:
Publisher:
publish.yml on Jnocode/recall-memory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
recall_sqlite-0.2.0.tar.gz -
Subject digest:
e657a988e9bb56d34fcbee29f9b290f7f1a2dfceafa140cf0ea2940810a1cece - Sigstore transparency entry: 1920027549
- Sigstore integration time:
-
Permalink:
Jnocode/recall-memory@590c361b05e89c6206bf76dcfe87f36a7027ff9a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Jnocode
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@590c361b05e89c6206bf76dcfe87f36a7027ff9a -
Trigger Event:
push
-
Statement type:
File details
Details for the file recall_sqlite-0.2.0-py3-none-any.whl.
File metadata
- Download URL: recall_sqlite-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33440810ccb68005059407229259667cee9b6e0b4572981ca08f354ed7d7877c
|
|
| MD5 |
1922e6316de608d10ca83004747cae6a
|
|
| BLAKE2b-256 |
6ec5a7b40059175f60e385943561946efa72e4218144e8e349c93f7f1f98f44d
|
Provenance
The following attestation bundles were made for recall_sqlite-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on Jnocode/recall-memory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
recall_sqlite-0.2.0-py3-none-any.whl -
Subject digest:
33440810ccb68005059407229259667cee9b6e0b4572981ca08f354ed7d7877c - Sigstore transparency entry: 1920027639
- Sigstore integration time:
-
Permalink:
Jnocode/recall-memory@590c361b05e89c6206bf76dcfe87f36a7027ff9a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Jnocode
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@590c361b05e89c6206bf76dcfe87f36a7027ff9a -
Trigger Event:
push
-
Statement type: