Local-first multimodal semantic memory for your machine — searchable text + screenshots, MCP-native, runs on CPU.
Project description
Lookback
Local-first, multimodal semantic memory for your machine.
Index your files, code, PDFs, browser history, and screenshots into a LanceDB store on disk. Query by meaning from the CLI or from any MCP-capable AI tool (Claude Code, Cursor, Continue, ChatGPT Desktop, Windsurf, Zed). Everything runs on-device — no cloud, no GPU.
Highlights
- Multimodal. Real semantic search over text + screenshots in a single index. Cross-modal: search for "fluffy clouds in the sky" and you'll get back the screenshot, not just text mentioning clouds.
- Local-first. Models (Nomic Embed v1.5 + MobileCLIP2-S2) run on CPU via ONNX Runtime. Your data and your queries never leave your laptop.
- MCP-native. A single
lookback servemakes the index available as a tool to every modern AI assistant. SeeMCP_SETUP.md. - Dev-grade DX. Single
pip install, sensible defaults, one config file, every subcommand documented.
Status
| Milestone | Scope | State |
|---|---|---|
| M0 | Design + scaffold | ✅ |
| M1 | Lance schema + store | ✅ |
| M2 | Text embedder ABC + mock + Nomic adapter; chunking; markdown extractor; indexer | ✅ |
| M3 | Image embedder mock + screenshot extractor | ✅ |
| M4 | PDF + code extractors | ✅ |
| M5 | CLI: init / index / search / stats / models | ✅ |
| M6 | Model registry, system probe, recommendation, init model selection |
✅ |
| M7 | Real Nomic + MobileCLIP weights wired end-to-end, @needs_models smoke tests |
✅ |
| M8a | Cross-modal text→image search via MobileCLIP joint text tower; --modality flag |
✅ |
| M8b | File watcher (lookback watch); MCP server (lookback serve); hybrid FTS + vector (--hybrid); MCP setup docs |
✅ |
194 tests, all green (10 of them gated on real model weights; auto-skip
when absent). Run uv sync && uv run pytest -q to verify.
Quick start
# Install
pip install lookback-ai # PyPI distribution; imports as `lookback`
# OR for local development:
uv sync && uv tool install --editable .
# Bootstrap config with system-aware model recommendation
uv run lookback init
# Detected: Darwin · arm64 · Apple Silicon · 16.0 GB RAM · 8 CPU
# Recommended: text=nomic-v1.5 image=mobileclip-s2
# Download weights (~700 MB total — Nomic v1.5 + MobileCLIP2-S2 vision + text + tokenizer)
uv run lookback models download nomic-v1.5 mobileclip-s2
# First-time index pass over directories you care about
uv run lookback index ~/Documents
uv run lookback index ~/Pictures/Screenshots
# Search
uv run lookback search "transformer attention notes"
uv run lookback search "a diagram with red and blue arrows" --modality image
uv run lookback search "IVF_PQ tuning" --hybrid # FTS + vector RRF fusion
# Keep the index up to date as files change
uv run lookback watch ~/Documents
# Expose to AI tools via MCP
uv run lookback serve # stdio (IDE-friendly)
uv run lookback serve --transport http --port 7777 # HTTP for remote
See MCP_SETUP.md for Claude Code / Cursor / Continue / ChatGPT Desktop / Windsurf / Zed configuration snippets.
Commands at a glance
| Command | What it does |
|---|---|
lookback init |
Detect system, recommend models, write ~/.lookback/config.toml. Flags: --text-model, --image-model, --interactive. |
lookback models list |
Show every registered model with HF repo and disk-size estimate. |
lookback models download <name> [<name> …] |
Fetch weights into models_dir. |
lookback index <path> |
Walk a path, hash + skip-if-unchanged, embed new/changed files, write to Lance. |
lookback search <query> |
Semantic search. Flags: `--modality text |
lookback stats |
Row counts per table. |
lookback watch <path> |
Foreground watcher — re-indexes on FS events. |
lookback serve |
MCP server. `--transport stdio |
Storage layout
~/.lookback/
├── config.toml # one TOML, hand-editable
├── models/
│ ├── nomic-v1.5/
│ │ ├── onnx/model.onnx
│ │ └── tokenizer.json
│ └── mobileclip-s2/
│ ├── onnx/s2/vision_model.onnx
│ ├── onnx/s2/text_model.onnx
│ └── tokenizer.json
└── data/
├── chunks_text.lance (Nomic 768-d)
├── chunks_image.lance (MobileCLIP 512-d)
└── files.lance (file-level state for incremental indexing)
What it indexes by default
Tier 1 (configured in roots, on by default):
- Markdown / plaintext —
.md,.markdown,.mdx,.txt,.log,.rst - PDFs — text-layer extraction via pypdf (OCR for image-only PDFs is M9)
- Source code — 40+ languages (Python, TS/JS, Go, Rust, Java, Swift, C/C++, Ruby, …) with language tags as
source_kind - Screenshots —
.png,.jpg,.webp,.gif,.bmp. Visually searchable via MobileCLIP.
Skipped: hidden directories, .gitignore'd paths, node_modules/.venv/target/build/dist/etc., files larger than max_file_bytes (50 MiB default), symlinks (unless follow_symlinks = true).
Hero demos (with real weights)
$ lookback search "fluffy clouds in the sky" --modality image
Image hits
┃ score ┃ kind ┃ meta ┃
│ 0.779 │ screenshot │ {"filename": "sky.png"} │
│ 0.926 │ screenshot │ {"filename": "dog.png"} │
$ lookback search "transformer attention paper"
Text hits
│ 0.309 │ markdown │ {"section": "Attention is all you need", ...} │
Image hits
│ 0.836 │ screenshot │ {"filename": "diagram.png"} │
$ lookback search "IVF_PQ tuning" --hybrid
Text hits
│ 0.033 │ markdown │ {"section": "IVF_PQ index tuning", ...} │ # exact-keyword boost
Design + architecture
See DESIGN.md for:
- Lance schema (chunks_text + chunks_image + files) and the perf-guide-driven decisions
- Embedding choices, dim selection, distance metrics
- Per-extractor chunking strategies
- Index types (IVF_PQ vector + bitmap/btree scalar + FTS inverted)
- Session-by-session implementation log
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lookback_ai-0.1.0.tar.gz.
File metadata
- Download URL: lookback_ai-0.1.0.tar.gz
- Upload date:
- Size: 180.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32da85797953494309a92516df5b40f32bb23046c7af074501f25f29660ac0f7
|
|
| MD5 |
43ce65712de00cf6bc5c3da6a898533d
|
|
| BLAKE2b-256 |
5398c1bd046ad5933be0681c5fcd994217681a309937fbb362f69cc40da111ac
|
File details
Details for the file lookback_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lookback_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e5db094bdbb7de1f9a5f3e4a8d8724e2c167e73fbed047d34d892202209e8bb
|
|
| MD5 |
882594d0a259f49ff2280cb108d87f4b
|
|
| BLAKE2b-256 |
06fdb019b077c707cf6efd841476f29f7db83454c575b31447a7bb2e4315a318
|