Skip to main content

Better contextual retrieval for AI agents. Three-path RRF retrieval in SQLite.

Project description

recall. 🧠

Better contextual retrieval for AI agents. Three-path RRF retrieval (ANN + keyword SQL JOIN + FTS5) in pure SQLite. No LLM at query time. ~80ms latency. 1400 real memories.

from recall import retrieve_relevant
store.add("User prefers docker-compose over Dockerfile")
results = retrieve_relevant("How should I deploy?", store)
# → "User prefers docker-compose over Dockerfile"

Prerequisites: Embedding Model

recall. uses nomic-embed-text-v1.5 (768-dim) running in LM Studio. No LLM — embedding models are tiny (~150MB), fast, and cost zero tokens.

1. Install LM Studio

Download from lmstudio.ai.

2. Load the embedding model

Step Screenshot / Cmd
Open LM Studio → Models tab
Search nomic-embed-text-v1.5 → Download ~150MB
Switch to Local Inference Server tab
Select nomic-embed-text-v1.5 in the model dropdown
Click Start Server port defaults to 1234
Verify it's working: curl http://127.0.0.1:1234/v1/models

Expected response:

{"object":"list","data":[{"id":"nomic-embed-text-v1.5","object":"model",...}]}

That's it. No API keys, no cloud services, no GPU required beyond what LM Studio needs (~2GB VRAM, also runs on CPU).

Port configuration

Default port is 1234. To change it, set EMBED_PORT in src/recall/embed.py:

EMBED_PORT = 1234  # change to match your LM Studio port

If LM Studio is down

Graceful degradation kicks in automatically:

  • retrieval (mcp_recall_recall / recall query) falls back to keyword + FTS5 search — no crash, just no ANN path
  • storage (mcp_recall_store_memory / recall add) saves memories without embeddings — still findable via keywords
  • CLI (recall add/stats/delete) unaffected — doesn't use embeddings at all

No error, no crash, no data loss. Just slightly less precise results.


Quick start

pip install numpy
pip install -e .

recall add "User prefers docker-compose for local dev"
recall query "How to deploy?"

Or via MCP server for Hermes Agent / Antigravity IDE / Gemini CLI:

Hermes (local install)

If recall is installed in the same Python env as Hermes:

# ~/.hermes/config.yaml
mcp_servers:
  recall:
    command: "python"
    args: ["-m", "recall.recall_mcp"]
    timeout: 30
    cwd: "/path/to/recall-memory"   # optional, needed for DB path resolution

Hermes (Docker)

# ~/.hermes/config.yaml
mcp_servers:
  recall:
    command: docker
    args:
      - run
      - -i
      - --rm
      - --network=host
      - -v
      - recall-data:/data
      - recall-memory:latest
    timeout: 30

Build the image first:

cd /path/to/recall-memory
docker compose build

Architecture

store.py       — SQLite backend + tier management
embed.py       — Nomic Embed via LM Studio REST API (768-dim)
retrieve.py    — Three-path RRF retrieval + tier router
cli.py         — Typer CLI (add / query / stats / delete / gc)
recall_mcp.py  — MCP server for agent integration

Tiered Storage (v0.2.0+)

Memories are split into three tiers to reduce compute and memory:

Tier Capacity Retrieval Compute Cost
Hot ~500 ANN + keywords + FTS5 (3-path RRF) Highest
Warm ~5000 keywords + FTS5 only (2-path RRF) Medium
Cold Unlimited Not indexed, fill-gap fallback only ~Zero
  • Hot: full vectors in ANN index. Fastest search.
  • Warm: keyword/FTS5 only, no vectors. 66–99% less ANN work.
  • Cold: doesn't participate in normal queries. Only searched when hot+warm results are insufficient.

Promotion/demotion is automatic based on access frequency. Cold memories are sampled every N queries for keyword overlap—if relevant, they're promoted back to warm. No cron, no UI, no configuration needed.

Three parallel retrieval paths, fused via RRF (Reciprocal Rank Fusion):

Path V: Vector search (ANN) — sqlite-vec cosine similarity (hot tier only)
Path K: Keyword SQL JOIN — multi-hop keyword expansion (all tiers)
Path F: FTS5 full-text search — porter tokenizer + unicode61 (all tiers)

Tier router → hot 3-path → warm 2-path → cold fill-gap

No LLM calls at query time. No vector database. Just SQLite.

Installation

Dependencies

Dependency Required? Notes
Python ≥3.10
numpy Cosine similarity + vector ops
typer CLI interface
sqlite-vec SQLite extension for ANN
LM Studio (port 1234) Runs nomic-embed-text-v1.5. See Prerequisites above.
pytest Only needed for development (pip install -e ".[dev]")
sentence-transformers Not used. The actual embedding calls go through LM Studio's HTTP API.
pip install numpy
pip install -e .      # installs recall-memory package + pulls sqlite-vec

Verify installation

recall stats
# → Memories: 0  Keywords: 0

CLI

recall add "content"           # Store a memory
recall query "question"        # Retrieve relevant memories (tiered)
recall query "question" --include-cold  # Search cold tier too
recall stats                   # Store statistics
recall stats --verbose         # + tier distribution
recall gc --dry-run            # Preview eviction candidates
recall gc                      # Run garbage collection
recall delete <id>             # Remove a memory

MCP Tools (Hermes / Antigravity / Gemini)

Three tools exposed via stdio MCP transport:

Tool Parameters Returns
recall query: str (required), k: int (default 5), include_cold: bool (default false) {memories: [...], count: int}
store_memory content: str (required), session_id: str, tag: str {id: str, status: "stored"}
memory_stats (none) {memories: int, keywords: int, tiers: {hot, warm, cold}}
gc_memory dry_run: bool (default false) {evicted/ candidates: int, db_size_mb: float}

Status

Production-ready MVP with tiered storage (v0.2.0). Tested against AIngram (tied on 1400 memories × 40 queries).

Memories: 1400 (from Honcho)
Keywords: 10560
Latency:  ~80ms/query (hot), ~60ms/query (warm fill-gap)
ANN scan: -66% (now) → -99% (at 50K memories)
Memory:   ~1.5MB fixed for hot tier vs linear growth
Eval:     recall@5 comparable to AIngram with full extractor

Upgrading

From v0.1.x to v0.2.0

pip install --upgrade recall-sqlite

Schema migration is automatic — SQLite ALTER TABLE runs on first start. No manual steps needed. Your existing memories are preserved and will start in the "hot" tier.

To verify:

recall stats --verbose
# Should show the same memory count with tier distribution

Rollback

pip install recall-sqlite==0.1.0

Design decisions

Decision Rationale
Three-path RRF ANN + SQL JOIN + FTS5 covers different failure modes
No LLM re-rank Extra latency + cost; not needed for retrieval quality
SQLite first Zero-deployment, portable, git-committable
Nomic embed via LM Studio 768-dim, better than MiniLM, no Python packaging hell
RRF fusion No weight tuning needed; standard IR technique

Comparison with AIngram

System R@5 (40 mems) R@5 (1400 mems) Latency
recall. 0.579 ~0.58 ~80ms
AIngram 0.583 ~0.58 ~27ms

Both systems tied on identical embedding model. recall.'s advantage: three-path architecture (AIngram uses two-path when extractor is unavailable).

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recall_sqlite-0.2.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recall_sqlite-0.2.0-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file recall_sqlite-0.2.0.tar.gz.

File metadata

  • Download URL: recall_sqlite-0.2.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for recall_sqlite-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e657a988e9bb56d34fcbee29f9b290f7f1a2dfceafa140cf0ea2940810a1cece
MD5 3c05edf5366e7433ec7456dcac31d398
BLAKE2b-256 2ac9d614a7b438b908146b4ef451d200f00e897d211f8cd78b4b8698afd4bb8b

See more details on using hashes here.

Provenance

The following attestation bundles were made for recall_sqlite-0.2.0.tar.gz:

Publisher: publish.yml on Jnocode/recall-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file recall_sqlite-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: recall_sqlite-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for recall_sqlite-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33440810ccb68005059407229259667cee9b6e0b4572981ca08f354ed7d7877c
MD5 1922e6316de608d10ca83004747cae6a
BLAKE2b-256 6ec5a7b40059175f60e385943561946efa72e4218144e8e349c93f7f1f98f44d

See more details on using hashes here.

Provenance

The following attestation bundles were made for recall_sqlite-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Jnocode/recall-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page