Better contextual retrieval for AI agents. Three-path RRF retrieval in SQLite.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

recall. 🧠

Better contextual retrieval for AI agents. Three-path RRF retrieval (ANN + keyword SQL JOIN + FTS5) in pure SQLite. No LLM at query time. ~80ms latency. 1400 real memories.

from recall import retrieve_relevant
store.add("User prefers docker-compose over Dockerfile")
results = retrieve_relevant("How should I deploy?", store)
# → "User prefers docker-compose over Dockerfile"

Prerequisites: Embedding Model

recall. uses nomic-embed-text-v1.5 (768-dim) running in LM Studio. No LLM — embedding models are tiny (~150MB), fast, and cost zero tokens.

1. Install LM Studio

Download from lmstudio.ai.

2. Load the embedding model

Step	Screenshot / Cmd
Open LM Studio → Models tab	—
Search `nomic-embed-text-v1.5` → Download	~150MB
Switch to Local Inference Server tab	—
Select `nomic-embed-text-v1.5` in the model dropdown	—
Click Start Server	port defaults to `1234`
Verify it's working:	`curl http://127.0.0.1:1234/v1/models`

Expected response:

{"object":"list","data":[{"id":"nomic-embed-text-v1.5","object":"model",...}]}

That's it. No API keys, no cloud services, no GPU required beyond what LM Studio needs (~2GB VRAM, also runs on CPU).

Port configuration

Default port is 1234. To change it, set EMBED_PORT in src/recall/embed.py:

EMBED_PORT = 1234  # change to match your LM Studio port

If LM Studio is down

Graceful degradation kicks in automatically:

retrieval (mcp_recall_recall / recall query) falls back to keyword + FTS5 search — no crash, just no ANN path
storage (mcp_recall_store_memory / recall add) saves memories without embeddings — still findable via keywords
CLI (recall add/stats/delete) unaffected — doesn't use embeddings at all

No error, no crash, no data loss. Just slightly less precise results.

Quick start

pip install numpy
pip install -e .

recall add "User prefers docker-compose for local dev"
recall query "How to deploy?"

Or via MCP server for Hermes Agent / Antigravity IDE / Gemini CLI:

Hermes (local install)

If recall is installed in the same Python env as Hermes:

# ~/.hermes/config.yaml
mcp_servers:
  recall:
    command: "python"
    args: ["-m", "recall.recall_mcp"]
    timeout: 30
    cwd: "/path/to/recall-memory"   # optional, needed for DB path resolution

Hermes (Docker)

# ~/.hermes/config.yaml
mcp_servers:
  recall:
    command: docker
    args:
      - run
      - -i
      - --rm
      - --network=host
      - -v
      - recall-data:/data
      - recall-memory:latest
    timeout: 30

Build the image first:

cd /path/to/recall-memory
docker compose build

Architecture

store.py       — SQLite backend + tier management
embed.py       — Nomic Embed via LM Studio REST API (768-dim)
retrieve.py    — Three-path RRF retrieval + tier router
cli.py         — Typer CLI (add / query / stats / delete / gc)
recall_mcp.py  — MCP server for agent integration

Tiered Storage (v0.2.0+)

Memories are split into three tiers to reduce compute and memory:

Tier	Capacity	Retrieval	Compute Cost
Hot	~500	ANN + keywords + FTS5 (3-path RRF)	Highest
Warm	~5000	keywords + FTS5 only (2-path RRF)	Medium
Cold	Unlimited	Not indexed, fill-gap fallback only	~Zero

Hot: full vectors in ANN index. Fastest search.
Warm: keyword/FTS5 only, no vectors. 66–99% less ANN work.
Cold: doesn't participate in normal queries. Only searched when hot+warm results are insufficient.

Promotion/demotion is automatic based on access frequency. Cold memories are sampled every N queries for keyword overlap—if relevant, they're promoted back to warm. No cron, no UI, no configuration needed.

Three parallel retrieval paths, fused via RRF (Reciprocal Rank Fusion):

Path V: Vector search (ANN) — sqlite-vec cosine similarity (hot tier only)
Path K: Keyword SQL JOIN — multi-hop keyword expansion (all tiers)
Path F: FTS5 full-text search — porter tokenizer + unicode61 (all tiers)

Tier router → hot 3-path → warm 2-path → cold fill-gap

No LLM calls at query time. No vector database. Just SQLite.

Installation

Dependencies

Dependency	Required?	Notes
Python ≥3.10	✅	—
numpy	✅	Cosine similarity + vector ops
typer	✅	CLI interface
sqlite-vec	✅	SQLite extension for ANN
LM Studio (port 1234)	✅	Runs nomic-embed-text-v1.5. See Prerequisites above.
pytest	❌	Only needed for development (`pip install -e ".[dev]"`)
sentence-transformers	❌	Not used. The actual embedding calls go through LM Studio's HTTP API.

pip install numpy
pip install -e .      # installs recall-memory package + pulls sqlite-vec

Verify installation

recall stats
# → Memories: 0  Keywords: 0

CLI

recall add "content"           # Store a memory
recall query "question"        # Retrieve relevant memories (tiered)
recall query "question" --include-cold  # Search cold tier too
recall stats                   # Store statistics
recall stats --verbose         # + tier distribution
recall gc --dry-run            # Preview eviction candidates
recall gc                      # Run garbage collection
recall delete <id>             # Remove a memory

MCP Tools (Hermes / Antigravity / Gemini)

Three tools exposed via stdio MCP transport:

Tool	Parameters	Returns
`recall`	`query: str` (required), `k: int (default 5)`, `include_cold: bool (default false)`	`{memories: [...], count: int}`
`store_memory`	`content: str` (required), `session_id: str`, `tag: str`	`{id: str, status: "stored"}`
`memory_stats`	(none)	`{memories: int, keywords: int, tiers: {hot, warm, cold}}`
`gc_memory`	`dry_run: bool (default false)`	`{evicted/ candidates: int, db_size_mb: float}`

Status

Production-ready MVP with tiered storage (v0.2.0). Tested against AIngram (tied on 1400 memories × 40 queries).

Memories: 1400 (from Honcho)
Keywords: 10560
Latency:  ~80ms/query (hot), ~60ms/query (warm fill-gap)
ANN scan: -66% (now) → -99% (at 50K memories)
Memory:   ~1.5MB fixed for hot tier vs linear growth
Eval:     recall@5 comparable to AIngram with full extractor

Upgrading

From v0.1.x to v0.2.0

pip install --upgrade recall-sqlite

Schema migration is automatic — SQLite ALTER TABLE runs on first start. No manual steps needed. Your existing memories are preserved and will start in the "hot" tier.

To verify:

recall stats --verbose
# Should show the same memory count with tier distribution

Rollback

pip install recall-sqlite==0.1.0

Design decisions

Decision	Rationale
Three-path RRF	ANN + SQL JOIN + FTS5 covers different failure modes
No LLM re-rank	Extra latency + cost; not needed for retrieval quality
SQLite first	Zero-deployment, portable, git-committable
Nomic embed via LM Studio	768-dim, better than MiniLM, no Python packaging hell
RRF fusion	No weight tuning needed; standard IR technique

Comparison with AIngram

System	R@5 (40 mems)	R@5 (1400 mems)	Latency
recall.	0.579	~0.58	~80ms
AIngram	0.583	~0.58	~27ms

Both systems tied on identical embedding model. recall.'s advantage: three-path architecture (AIngram uses two-path when extractor is unavailable).

License

Apache 2.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jnocode

Release history Release notifications | RSS feed

This version

0.2.0

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recall_sqlite-0.2.0.tar.gz (23.3 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

recall_sqlite-0.2.0-py3-none-any.whl (20.7 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file recall_sqlite-0.2.0.tar.gz.

File metadata

Download URL: recall_sqlite-0.2.0.tar.gz
Upload date: Jun 23, 2026
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for recall_sqlite-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e657a988e9bb56d34fcbee29f9b290f7f1a2dfceafa140cf0ea2940810a1cece`
MD5	`3c05edf5366e7433ec7456dcac31d398`
BLAKE2b-256	`2ac9d614a7b438b908146b4ef451d200f00e897d211f8cd78b4b8698afd4bb8b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for recall_sqlite-0.2.0.tar.gz:

Publisher: publish.yml on Jnocode/recall-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: recall_sqlite-0.2.0.tar.gz
- Subject digest: e657a988e9bb56d34fcbee29f9b290f7f1a2dfceafa140cf0ea2940810a1cece
- Sigstore transparency entry: 1920027549
- Sigstore integration time: Jun 23, 2026
Source repository:
- Permalink: Jnocode/recall-memory@590c361b05e89c6206bf76dcfe87f36a7027ff9a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Jnocode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@590c361b05e89c6206bf76dcfe87f36a7027ff9a
- Trigger Event: push

File details

Details for the file recall_sqlite-0.2.0-py3-none-any.whl.

File metadata

Download URL: recall_sqlite-0.2.0-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 20.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for recall_sqlite-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`33440810ccb68005059407229259667cee9b6e0b4572981ca08f354ed7d7877c`
MD5	`1922e6316de608d10ca83004747cae6a`
BLAKE2b-256	`6ec5a7b40059175f60e385943561946efa72e4218144e8e349c93f7f1f98f44d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for recall_sqlite-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Jnocode/recall-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: recall_sqlite-0.2.0-py3-none-any.whl
- Subject digest: 33440810ccb68005059407229259667cee9b6e0b4572981ca08f354ed7d7877c
- Sigstore transparency entry: 1920027639
- Sigstore integration time: Jun 23, 2026
Source repository:
- Permalink: Jnocode/recall-memory@590c361b05e89c6206bf76dcfe87f36a7027ff9a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Jnocode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@590c361b05e89c6206bf76dcfe87f36a7027ff9a
- Trigger Event: push

recall-sqlite 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

recall. 🧠

Prerequisites: Embedding Model

1. Install LM Studio

2. Load the embedding model

Port configuration

If LM Studio is down

Quick start

Hermes (local install)

Hermes (Docker)

Architecture

Tiered Storage (v0.2.0+)

Installation

Dependencies

Verify installation

CLI

MCP Tools (Hermes / Antigravity / Gemini)

Status

Upgrading

From v0.1.x to v0.2.0

Rollback

Design decisions

Comparison with AIngram

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance