Local-first AI image search & management โ no cloud, no API keys, 100% private
Project description
๐๏ธ Archivist-AI
Local-first AI image search & management โ no cloud, no API keys, 100% private.
Search your entire photo library with plain English. Runs fully offline on any CPU.
Quick Start ยท Features ยท Web UI ยท CLI ยท How It Works ยท Roadmap
Why Archivist-AI?
Most photo search tools send your images to the cloud. Google Photos, Apple Photos, and Amazon Photos all require accounts, upload your data to remote servers, and lock you into their ecosystem.
Archivist-AI runs entirely on your machine. Your photos never leave your computer.
| Archivist-AI | Google Photos | Apple Photos | |
|---|---|---|---|
| Works offline | โ | โ | โ |
| No account needed | โ | โ | โ |
| Images stay on your machine | โ | โ | โ |
| Natural language search | โ | โ | โ |
| Reverse image search | โ | โ | โ |
| Duplicate detection | โ | โ | โ |
| Open source | โ | โ | โ |
| Works on any folder | โ | โ | โ |
โจ Features
- ๐ Natural language search โ Type
"birthday cake with candles"or"sunset over mountains"and find the right photo instantly. Powered by SigLIP (Google's state-of-the-art vision-language model). - ๐ผ๏ธ Reverse image search โ Drag in any image to find visually similar photos in your library.
- ๐ Duplicate detection โ Finds near-duplicate images using perceptual similarity โ catches re-encoded, cropped, or slightly edited copies.
- ๐ท๏ธ Zero-shot auto-tagging โ Automatically tag images using natural categories (
portrait,sunset,dog,indoor) with no training required. - ๐ Smart organiser โ Copy or move search results to a new folder, or rename them by query.
- ๐๏ธ Folder watcher โ Monitor directories and auto-index new images in real time.
- ๐ Date filtering โ Filter searches by EXIF date or file modification date.
- โก ONNX acceleration โ Export the model to ONNX for 3โ5ร faster CPU inference.
- ๐ฅ๏ธ Gradio web UI โ A clean local browser interface for all features.
- โจ๏ธ Full CLI โ Scriptable, composable, pipe-friendly.
๐ Quick Start
1. Install
pip install archivist-ai
Requirements: Python 3.9+. No GPU needed.
2. Index your photos
archivist index ~/Pictures
The first run downloads the SigLIP model (~375 MB, once). Subsequent runs only process new images.
3. Search
archivist search "people laughing at a dinner table"
4. Launch the web UI
archivist ui
Open http://127.0.0.1:7860 in your browser.
๐ฅ๏ธ Web UI
Launch with archivist ui and get a full-featured browser interface:
| Tab | What it does |
|---|---|
| ๐ Text Search | Natural language search with similarity threshold and date filters |
| ๐ผ๏ธ Reverse Image Search | Upload any image to find visually similar ones |
| ๐ Find Duplicates | Scan for near-duplicates and delete extras with one click |
| ๐ Index Folder | Add a new folder to the index from the browser |
| ๐ Stats | Index size, date range, storage breakdown |
โจ๏ธ CLI Reference
archivist index <dirs...> Index image directories (incremental)
archivist search <query> Natural language search
archivist similar <image> Reverse image search
archivist dupes Find near-duplicate images
archivist tag Auto-tag all untagged images
archivist copy <query> <dest> Copy search results to a folder
archivist watch <dirs...> Watch folders and auto-index new arrivals
archivist clean Remove stale entries for deleted files
archivist stats Show index statistics
archivist export-onnx Export model to ONNX (3โ5ร faster)
archivist ui Launch the Gradio web UI
Examples:
# Search with stricter threshold and more results
archivist search "cats playing" --top-k 50 --threshold 0.3
# Index multiple folders, non-recursive
archivist index ~/Photos ~/Downloads --no-recursive
# Find only near-identical duplicates
archivist dupes --threshold 0.99
# Preview what would be copied without doing it
archivist copy "wedding photos" ~/Desktop/Wedding --dry-run
# Watch a folder and auto-index as new photos arrive
archivist watch ~/Downloads
โก Speed: ONNX Mode
For significantly faster indexing and search on CPU:
# Export the model once (takes ~1 minute)
archivist export-onnx
# All subsequent commands use ONNX automatically
archivist search "golden retriever"
ONNX mode enables int8 quantization and skips PyTorch entirely at inference time.
| Mode | ~Time per image |
|---|---|
| SigLIP (PyTorch, default) | ~0.30 s |
| SigLIP + quantization | ~0.15 s |
| ONNX (after export) | ~0.06โ0.10 s |
๐ง How It Works
Your Photos
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Indexer โ
โ โข SHA-256 dedup (skip unchanged files) โ
โ โข EXIF date extraction โ
โ โข SigLIP / ONNX embedding โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 768-dim float32 vector
โโโโโโโโโโผโโโโโโโโโ
โ FAISS Index โ โ vector similarity search
โ (IndexFlatIP) โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโโโโโผโโโโโโโโโ
โ SQLite DB โ โ file path, hash, tags, date
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ Query โ
โ "people in suits" โโโโโโโโโโ โโโโถ text embedding โ FAISS โ ranked results
โ query_image.jpg โโโโโโโโโโโโ โโโโถ image embedding โ FAISS โ ranked results
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Why SigLIP over CLIP? SigLIP uses a sigmoid loss instead of softmax, making it significantly better at zero-shot retrieval โ especially for complex or multi-concept queries. It's the model powering Google Lens.
Why FAISS? Facebook's FAISS performs exact inner-product search in milliseconds even across 100,000+ images, with no server required.
Incremental indexing:
Files are identified by SHA-256 hash. Re-running archivist index on the same folder is near-instant โ only new or changed files are embedded.
๐ฆ Installation Options
Stable (pip):
pip install archivist-ai
With ONNX acceleration:
pip install "archivist-ai[onnx]"
From source:
git clone https://github.com/abdullahkousa2/archivist-ai
cd archivist-ai
pip install -e ".[dev]"
โ๏ธ Configuration
The config file lives at ~/.archivist/config.json and is created automatically on first run.
{
"model_id": "google/siglip-base-patch16-224",
"device": "cpu",
"quantize": true,
"use_onnx": false,
"batch_size": 16,
"top_k": 20,
"duplicate_threshold": 0.97,
"autotag_on_index": false
}
| Key | Default | Description |
|---|---|---|
model_id |
google/siglip-base-patch16-224 |
Vision-language model |
quantize |
true |
Dynamic int8 quantization (faster, no quality loss) |
use_onnx |
false |
Use ONNX runtime (run export-onnx first) |
batch_size |
16 |
Images per embedding batch |
duplicate_threshold |
0.97 |
Cosine similarity cutoff for duplicates |
autotag_on_index |
false |
Auto-tag every image during indexing (slower) |
๐บ๏ธ Roadmap
- OCR search โ find images containing specific text
- Face clustering โ group photos by person (fully local)
- Smart albums โ saved searches that auto-update
- Metadata editing โ write tags back to EXIF
- Plugin API โ bring your own embedder
- Desktop app (Electron/Tauri wrapper)
๐ค Contributing
Contributions are very welcome. See CONTRIBUTING.md to get started.
git clone https://github.com/abdullahkousa2/archivist-ai
cd archivist-ai
pip install -e ".[dev]"
pytest tests/
Please open an issue before submitting large PRs so we can discuss the approach first.
๐ License
MIT ยฉ 2025 โ see LICENSE for details.
If Archivist-AI is useful to you, a โญ on GitHub goes a long way.
Built for people who believe their photos belong to them.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file archivist_ai-0.1.0.tar.gz.
File metadata
- Download URL: archivist_ai-0.1.0.tar.gz
- Upload date:
- Size: 51.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de1eb1d83057f901246e677dd12e1555ae5175ea111775ceda3acb3eeff03cfc
|
|
| MD5 |
16b1d8bab23ba1d5cec4a9614cea91bc
|
|
| BLAKE2b-256 |
f58039adf3b2077c351a05c5f69e8ee7a4532cf9dd21f37aee36d80aff31ec71
|
File details
Details for the file archivist_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: archivist_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
170b1090d22388893d5c77b4c7efd1889a1dd1d5c5e95585a270cc803d33d9c4
|
|
| MD5 |
f78c99c17596d98a98c0f118b8212c2b
|
|
| BLAKE2b-256 |
9f4dcdb933dec63a441fb7012dc01a7eda4abde327ec396e94626265c9edda8d
|