Skip to main content

RecallForge - Cross-Modal Vision-Language Search Engine

Project description

RecallForge

CI PyPI License Python

Every modality, one search. Local first.

RecallForge — Your Files → One Search

Standard RAG only works on text. Drop a PDF with charts, a photo of a whiteboard, or a video recording — and your AI agent goes blind. RecallForge gives agents eyes and ears over your local filesystem. Text, images, documents, and video all live in one unified search space, and nothing ever leaves your machine.

What this enables

You: "What did the whiteboard look like in our last meeting?"

Claude: (Searches your local ~/Documents, finds a photo of a whiteboard from an iPhone, reads the handwriting via Qwen3-VL, and surfaces the image with context.)

You: "Find the architecture diagram from that PDF I downloaded last week."

Claude: (Indexes the PDF, matches your query against extracted text and embedded figures, returns the relevant page.)

You: (Drops an image of a circuit board) "Find my notes related to this."

Claude: (Reverse image-to-text search across your indexed notes. Returns matching documents.)

One query. Any modality. All local.

What makes RecallForge different

Capability RecallForge Chroma Mem0 Qdrant Weaviate
Cross-modal search ✅ Native ✅ OpenCLIP ❌ Text only ✅ CLIP modules
Video support [Beta]
Document ingest (PDF/DOCX/PPTX)
Built-in reranking ✅ Multimodal ✅ ColBERT ✅ Modules
Query expansion ✅ Multimodal ✅ Generative
MCP-native ✅ 16 tools
100% local ⚠️ Cloud default ✅ Docker
Apple Silicon optimized ✅ MLX 4-bit
Cloud option
JS/TS SDK

Use RecallForge when: You need multimodal memory for AI agents that runs entirely on your machine, especially on Apple Silicon. One search across text, images, documents, and video.

Use something else when: You need cloud hosting, massive scale (millions+ vectors), or a JS/TS-first ecosystem.

Performance

4 modalities (text, images, documents, video) unified in a single MLX-optimized local vector space. Sub-60ms search latency in embed mode. Under 400MB resident memory.

Pipeline ablation (Mac mini M4 16GB, MLX 4-bit)

Each stage of the pipeline improves retrieval quality. The architecture is the product.

Stage R@1 R@5 R@10 MRR p50
Vector-only 68.3% 68.3% 70.0% 68.5% 17ms
BM25-only 55.0% 55.0% 85.0% 60.0% 15ms
Vector + BM25 (RRF) 71.7% 86.7% 86.7% 76.4% 93ms
+ Reranker (hybrid mode) 83.3% 91.7% 95.0% 86.9% 3.9s
+ Query expansion (full mode) 83.3% 90.0% 93.3% 85.7% 5.7s

The reranker is the big win: +22% R@1 over raw embeddings, pushing R@10 to 95%. Embed mode gives you 17ms searches for speed-sensitive workloads. Hybrid/full mode gives you 83%+ R@1 when quality matters.

Measured on 200 text documents + 50 images with ground-truth queries. See benchmarks/ for methodology.

Latency & resource usage

Metric MLX 4-bit PyTorch fp16
Warm search p50 (embed) 53ms 599ms
Warm search p95 (embed) 55ms
Cold start 7.6s ~20s
Peak RSS (embed) 329MB* ~4GB
Text indexing 5.0 docs/sec

*MLX maps model weights lazily via memory-mapped files. RSS reflects resident pages, not full model size (~1.7GB on disk for embed mode). Actual memory pressure is low.

COCO 1K retrieval (raw embeddings, no pipeline)

For transparency: raw embedding quality on the standard COCO benchmark (1,000 images, no BM25/reranking/expansion). These numbers reflect the Qwen3-VL-2B embedder alone, not the full pipeline.

Direction R@1 R@5 R@10
Text → Image 24.5% 42.3% 49.9%
Image → Text 34.3% 42.0% 44.1%

Qwen3-VL is a generative VLM, not a contrastive model like CLIP. The pipeline ablation above shows how BM25 fusion and reranking compensate for this.

Installation

pip install recallforge[mlx]       # Apple Silicon (recommended, 4-bit quantization)
pip install recallforge[cuda]      # NVIDIA GPU
pip install recallforge[torch]     # CPU / other PyTorch targets
pip install recallforge[docs]      # add richer PDF extraction (optional)

Note: pip install recallforge installs the core without a backend. You need at least one of [mlx], [cuda], or [torch] to run inference.

From source:

git clone https://github.com/brianmeyer/recallforge.git
cd recallforge
pip install -e ".[mlx]"

Requirements

  • Python 3.12 or 3.13 required (3.14 not yet supported, pending pyarrow wheel)
  • Disk: ~2-5GB free for model downloads on first run
  • RAM (MLX 4-bit): ~1.7GB (embed) to ~4.4GB (full)
  • ffmpeg recommended for video indexing/search
  • First run downloads models automatically and may take a few minutes

MCP Server (primary use)

RecallForge is designed as a Model Context Protocol server for AI agents. Configure in Claude Desktop (or any MCP-compatible agent host):

{
  "mcpServers": {
    "recallforge": {
      "command": "recallforge",
      "args": ["serve", "--mode", "full"]
    }
  }
}

Run manually:

recallforge serve --mode embed --backend mlx --quantize 4bit

Exposes 16 tools for agents: ingest, search, search_fts, search_vec, index_document, index_image, memory_add, memory_update, memory_delete, status, rebuild_fts, list_collections, list_namespaces, batch, get_config, set_config.

See docs/mcp-tools.md for the full tool reference.

Search modes

Mode Models loaded Memory (MLX 4-bit) Quality Best for
embed Embedder ~1.7GB Good Memory-constrained, fast searches
hybrid + Reranker ~3.4GB Better Balanced quality and memory
full + Query Expander ~4.4GB Best Maximum retrieval quality

Video [Beta] note: Video support requires ffmpeg. The torch backend video path has a known upstream issue (see QwenLM/Qwen3.5#58).

How it works

RecallForge encodes text, images, and video frames into the same 2048-dimensional vector space using Qwen3-VL. This means "find notes about this diagram" works whether the diagram is text, an image, or a frame from a video. A 3-stage pipeline handles the rest:

graph TD
    subgraph Local Filesystem
        Docs[📄 Documents]
        Imgs[🖼️ Images]
        Vids[🎬 Video]
    end

    subgraph RecallForge Ingest
        Docs --> TxtExt[Text Extractor]
        Imgs --> VLM[Qwen3-VL Encoder]
        Vids --> Frame[Frame & Audio Extractor]
        Frame --> VLM
        TxtExt --> VLM
    end

    subgraph LanceDB Storage
        VLM -->|2048-dim Vectors| VecDB[(Vector Space)]
        TxtExt -->|Text/Transcripts| FTS[(Tantivy FTS)]
    end

    subgraph MCP Search Pipeline
        Query[Agent Query] --> BM25[BM25 Text Search]
        Query --> Dense[Vector Similarity Search]
        BM25 --> RRF[RRF Fusion]
        Dense --> RRF
        RRF --> Rerank[Cross-Encoder Reranker]
        Rerank --> Output[Final Context to Agent]
    end

Pipeline: BM25 probe → Query expansion (full mode) → Parallel BM25 + Vector → RRF fusion → Reranking (hybrid/full) → Score blending

CLI (development & debugging)

# Index anything
recallforge index ./photos ./docs
recallforge index ~/Movies/demo.mp4
recallforge index ~/Documents/roadmap.pptx

# Search any modality
recallforge search "whiteboard diagram from last meeting"
recallforge search --image ./photos/whiteboard.png
recallforge search --video ~/Movies/demo.mp4

# Watch a folder for changes (auto-index)
recallforge watch start ~/Documents --collection docs
recallforge watch list
recallforge watch stop ~/Documents

# Status
recallforge status

RecallForge auto-detects MLX on Apple Silicon, PyTorch elsewhere.

Python API

from recallforge import get_backend, get_storage
from recallforge.search import HybridSearcher

backend = get_backend()
storage = get_storage()
backend.warm_up()

# Index
storage.index_document(
    path="notes.md",
    text="My notes about AI...",
    collection="my_docs",
    model="Qwen3-VL-Embedding-2B",
    embed_func=backend.embed_text,
)

# Search
searcher = HybridSearcher(backend=backend, storage=storage, limit=10)
results = searcher.search("artificial intelligence")
for r in results:
    print(f"[{r.score:.3f}] {r.title}")

Configuration

Variable Default Description
RECALLFORGE_BACKEND auto auto, mlx, torch
RECALLFORGE_MODE full embed, hybrid, full
RECALLFORGE_MLX_QUANTIZE 4bit 4bit, bf16
RECALLFORGE_STORE_PATH ~/.recallforge Storage directory

Project structure

src/recallforge/
├── backends/
│   ├── mlx_backend.py    # MLX 4-bit/bf16 (Apple Silicon)
│   └── torch_backend.py  # PyTorch (CUDA/MPS/CPU)
├── storage/
│   └── lancedb_backend.py # LanceDB + Tantivy FTS
├── cache.py              # LRU embedding cache
├── search.py             # Hybrid search pipeline (BM25 + vector + RRF)
├── server.py             # MCP server (16 tools)
├── documents.py          # PDF/DOCX/PPTX extraction
├── video.py              # Frame/transcript extraction
├── watch_folder.py       # Folder monitoring with dedup
└── cli.py                # CLI interface

Development

pytest tests/ -m "not live"    # Unit tests (no model download needed)
pytest tests/ -m live -v       # Integration tests (requires models)

See CONTRIBUTING.md for full development guidelines.

Attribution

RecallForge is inspired by QMD by Tobi. QMD pioneered the multi-stage retrieval pipeline (embedding, reranking, query expansion). RecallForge extends this pattern to vision-language with cross-modal retrieval and multi-backend support.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recallforge-0.1.2.tar.gz (105.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recallforge-0.1.2-py3-none-any.whl (81.2 kB view details)

Uploaded Python 3

File details

Details for the file recallforge-0.1.2.tar.gz.

File metadata

  • Download URL: recallforge-0.1.2.tar.gz
  • Upload date:
  • Size: 105.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for recallforge-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6f71990bc98b57bc18f47e175ef7361f1d526826cad4b8306ea94b6be4b295a8
MD5 627a4747eb6bd059c5878f1e6c04096f
BLAKE2b-256 ec7fba4169f37899cb4cbbf454546191fb99313912482ef7d79569f155cd9e63

See more details on using hashes here.

Provenance

The following attestation bundles were made for recallforge-0.1.2.tar.gz:

Publisher: publish.yml on brianmeyer/recallforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file recallforge-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: recallforge-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 81.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for recallforge-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5cf6c8b2052a843994b6313be7434bf8dfe93845e24f576d084ea11d3f6494ed
MD5 a120cb25dfdb0466ab5c13519b1b47bd
BLAKE2b-256 8b676a275cb2f1ca3bb4aa21a52b2ddce7469555caf3cbd44748fdaa1349a337

See more details on using hashes here.

Provenance

The following attestation bundles were made for recallforge-0.1.2-py3-none-any.whl:

Publisher: publish.yml on brianmeyer/recallforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page