RecallForge - Cross-Modal Vision-Language Search Engine
Project description
RecallForge
Every modality, one search. Local first.
Standard RAG only works on text. Drop a PDF with charts, a photo of a whiteboard, or a video recording — and your AI agent goes blind. RecallForge gives agents eyes and ears over your local filesystem. Text, images, documents, and video all live in one unified search space, and nothing ever leaves your machine.
What this enables
You: "What did the whiteboard look like in our last meeting?"
Claude: (Searches your local
~/Documents, finds a photo of a whiteboard from an iPhone, reads the handwriting via Qwen3-VL, and surfaces the image with context.)
You: "Find the architecture diagram from that PDF I downloaded last week."
Claude: (Indexes the PDF, matches your query against extracted text and embedded figures, returns the relevant page.)
You: (Drops an image of a circuit board) "Find my notes related to this."
Claude: (Reverse image-to-text search across your indexed notes. Returns matching documents.)
One query. Any modality. All local.
What makes RecallForge different
| Capability | RecallForge | Chroma | Mem0 | Qdrant | Weaviate |
|---|---|---|---|---|---|
| Cross-modal search | ✅ Native | ✅ OpenCLIP | ❌ Text only | ❌ | ✅ CLIP modules |
| Video support [Beta] | ✅ | ❌ | ❌ | ❌ | ❌ |
| Document ingest (PDF/DOCX/PPTX) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Built-in reranking | ✅ Multimodal | ❌ | ❌ | ✅ ColBERT | ✅ Modules |
| Query expansion | ✅ Multimodal | ❌ | ❌ | ❌ | ✅ Generative |
| MCP-native | ✅ 16 tools | ❌ | ❌ | ❌ | ❌ |
| 100% local | ✅ | ✅ | ⚠️ Cloud default | ✅ | ✅ Docker |
| Apple Silicon optimized | ✅ MLX 4-bit | ❌ | ❌ | ❌ | ❌ |
| Cloud option | ❌ | ✅ | ✅ | ✅ | ✅ |
| JS/TS SDK | ❌ | ✅ | ✅ | ✅ | ✅ |
Use RecallForge when: You need multimodal memory for AI agents that runs entirely on your machine, especially on Apple Silicon. One search across text, images, documents, and video.
Use something else when: You need cloud hosting, massive scale (millions+ vectors), or a JS/TS-first ecosystem.
Performance
4 modalities (text, images, documents, video) unified in a single MLX-optimized local vector space. Sub-60ms search latency. Under 400MB resident memory.
Measured on Mac mini M4 16GB, MLX 4-bit, embed mode:
| Metric | MLX 4-bit | PyTorch fp16 |
|---|---|---|
| Warm search p50 | 53ms | 599ms |
| Warm search p95 | 55ms | — |
| Cold start | 7.6s | ~20s |
| Peak RSS (embed) | 329MB* | ~4GB |
| Text indexing | 5.0 docs/sec | — |
*MLX maps model weights lazily via memory-mapped files. RSS reflects resident pages, not full model size (~1.7GB on disk for embed mode). Actual memory pressure is low.
Search quality comes from the multi-stage pipeline (BM25 + vector + RRF fusion + cross-encoder reranking), not raw embedding accuracy alone.
Installation
pip install recallforge[mlx] # Apple Silicon (recommended, 4-bit quantization)
pip install recallforge[cuda] # NVIDIA GPU
pip install recallforge[torch] # CPU / other PyTorch targets
pip install recallforge[docs] # add richer PDF extraction (optional)
Note:
pip install recallforgeinstalls the core without a backend. You need at least one of[mlx],[cuda], or[torch]to run inference.
From source:
git clone https://github.com/brianmeyer/recallforge.git
cd recallforge
pip install -e ".[mlx]"
Requirements
- Python 3.12 or 3.13 required (3.14 not yet supported, pending pyarrow wheel)
- Disk: ~2-5GB free for model downloads on first run
- RAM (MLX 4-bit): ~1.7GB (
embed) to ~4.4GB (full) ffmpegrecommended for video indexing/search- First run downloads models automatically and may take a few minutes
MCP Server (primary use)
RecallForge is designed as a Model Context Protocol server for AI agents. Configure in Claude Desktop (or any MCP-compatible agent host):
{
"mcpServers": {
"recallforge": {
"command": "recallforge",
"args": ["serve", "--mode", "full"]
}
}
}
Run manually:
recallforge serve --mode embed --backend mlx --quantize 4bit
Exposes 16 tools for agents: ingest, search, search_fts, search_vec, index_document, index_image, memory_add, memory_update, memory_delete, status, rebuild_fts, list_collections, list_namespaces, batch, get_config, set_config.
See docs/mcp-tools.md for the full tool reference.
Search modes
| Mode | Models loaded | Memory (MLX 4-bit) | Quality | Best for |
|---|---|---|---|---|
embed |
Embedder | ~1.7GB | Good | Memory-constrained, fast searches |
hybrid |
+ Reranker | ~3.4GB | Better | Balanced quality and memory |
full |
+ Query Expander | ~4.4GB | Best | Maximum retrieval quality |
Video [Beta] note: Video support requires
ffmpeg. The torch backend video path has a known upstream issue (see QwenLM/Qwen3.5#58).
How it works
RecallForge encodes text, images, and video frames into the same 2048-dimensional vector space using Qwen3-VL. This means "find notes about this diagram" works whether the diagram is text, an image, or a frame from a video. A 3-stage pipeline handles the rest:
graph TD
subgraph Local Filesystem
Docs[📄 Documents]
Imgs[🖼️ Images]
Vids[🎬 Video]
end
subgraph RecallForge Ingest
Docs --> TxtExt[Text Extractor]
Imgs --> VLM[Qwen3-VL Encoder]
Vids --> Frame[Frame & Audio Extractor]
Frame --> VLM
TxtExt --> VLM
end
subgraph LanceDB Storage
VLM -->|2048-dim Vectors| VecDB[(Vector Space)]
TxtExt -->|Text/Transcripts| FTS[(Tantivy FTS)]
end
subgraph MCP Search Pipeline
Query[Agent Query] --> BM25[BM25 Text Search]
Query --> Dense[Vector Similarity Search]
BM25 --> RRF[RRF Fusion]
Dense --> RRF
RRF --> Rerank[Cross-Encoder Reranker]
Rerank --> Output[Final Context to Agent]
end
Pipeline: BM25 probe → Query expansion (full mode) → Parallel BM25 + Vector → RRF fusion → Reranking (hybrid/full) → Score blending
CLI (development & debugging)
# Index anything
recallforge index ./photos ./docs
recallforge index ~/Movies/demo.mp4
recallforge index ~/Documents/roadmap.pptx
# Search any modality
recallforge search "whiteboard diagram from last meeting"
recallforge search --image ./photos/whiteboard.png
recallforge search --video ~/Movies/demo.mp4
# Watch a folder for changes (auto-index)
recallforge watch start ~/Documents --collection docs
recallforge watch list
recallforge watch stop ~/Documents
# Status
recallforge status
RecallForge auto-detects MLX on Apple Silicon, PyTorch elsewhere.
Python API
from recallforge import get_backend, get_storage
from recallforge.search import HybridSearcher
backend = get_backend()
storage = get_storage()
backend.warm_up()
# Index
storage.index_document(
path="notes.md",
text="My notes about AI...",
collection="my_docs",
model="Qwen3-VL-Embedding-2B",
embed_func=backend.embed_text,
)
# Search
searcher = HybridSearcher(backend=backend, storage=storage, limit=10)
results = searcher.search("artificial intelligence")
for r in results:
print(f"[{r.score:.3f}] {r.title}")
Configuration
| Variable | Default | Description |
|---|---|---|
RECALLFORGE_BACKEND |
auto |
auto, mlx, torch |
RECALLFORGE_MODE |
full |
embed, hybrid, full |
RECALLFORGE_MLX_QUANTIZE |
4bit |
4bit, bf16 |
RECALLFORGE_STORE_PATH |
~/.recallforge |
Storage directory |
Project structure
src/recallforge/
├── backends/
│ ├── mlx_backend.py # MLX 4-bit/bf16 (Apple Silicon)
│ └── torch_backend.py # PyTorch (CUDA/MPS/CPU)
├── storage/
│ └── lancedb_backend.py # LanceDB + Tantivy FTS
├── cache.py # LRU embedding cache
├── search.py # Hybrid search pipeline (BM25 + vector + RRF)
├── server.py # MCP server (16 tools)
├── documents.py # PDF/DOCX/PPTX extraction
├── video.py # Frame/transcript extraction
├── watch_folder.py # Folder monitoring with dedup
└── cli.py # CLI interface
Development
pytest tests/ -m "not live" # Unit tests (no model download needed)
pytest tests/ -m live -v # Integration tests (requires models)
See CONTRIBUTING.md for full development guidelines.
Attribution
RecallForge is inspired by QMD by Tobi. QMD pioneered the multi-stage retrieval pipeline (embedding, reranking, query expansion). RecallForge extends this pattern to vision-language with cross-modal retrieval and multi-backend support.
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recallforge-0.1.1.tar.gz.
File metadata
- Download URL: recallforge-0.1.1.tar.gz
- Upload date:
- Size: 103.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a7df7b973fbfce1e9fa45f3815d84b1924d5c1f1bec90f5fc9609c01193ecba
|
|
| MD5 |
8bedf75578223a47a07d8efdcb9ad944
|
|
| BLAKE2b-256 |
6bd1cdb32ed5c9f345f9697556d52627722bf60f805bc8d4ca9594abeb44e4e4
|
Provenance
The following attestation bundles were made for recallforge-0.1.1.tar.gz:
Publisher:
publish.yml on brianmeyer/recallforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
recallforge-0.1.1.tar.gz -
Subject digest:
2a7df7b973fbfce1e9fa45f3815d84b1924d5c1f1bec90f5fc9609c01193ecba - Sigstore transparency entry: 1106430268
- Sigstore integration time:
-
Permalink:
brianmeyer/recallforge@2fd7fd3c8d49915522cc037eb7a199c02a779617 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/brianmeyer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2fd7fd3c8d49915522cc037eb7a199c02a779617 -
Trigger Event:
push
-
Statement type:
File details
Details for the file recallforge-0.1.1-py3-none-any.whl.
File metadata
- Download URL: recallforge-0.1.1-py3-none-any.whl
- Upload date:
- Size: 78.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37ab4ffe22880f0e0b01017199c79e9403007e5a3fc29c5862e6d5d16d842503
|
|
| MD5 |
3bb649c7110445cce82927fbfbd94c82
|
|
| BLAKE2b-256 |
cd3df46f72e616b3905f88686ef41882ec6d0bbd4bd53906ed9b32ac356fe151
|
Provenance
The following attestation bundles were made for recallforge-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on brianmeyer/recallforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
recallforge-0.1.1-py3-none-any.whl -
Subject digest:
37ab4ffe22880f0e0b01017199c79e9403007e5a3fc29c5862e6d5d16d842503 - Sigstore transparency entry: 1106430269
- Sigstore integration time:
-
Permalink:
brianmeyer/recallforge@2fd7fd3c8d49915522cc037eb7a199c02a779617 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/brianmeyer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2fd7fd3c8d49915522cc037eb7a199c02a779617 -
Trigger Event:
push
-
Statement type: