Skip to main content

Local-first AI image search & management โ€” no cloud, no API keys, 100% private

Project description

๐Ÿ—‚๏ธ Archivist-AI

Local-first AI image search & management โ€” no cloud, no API keys, 100% private.

CI PyPI Python License: MIT Platform HuggingFace Space

Search your entire photo library with plain English. Runs fully offline on any CPU.

Quick Start ยท Features ยท Web UI ยท CLI ยท How It Works ยท Roadmap


Why Archivist-AI?

Most photo search tools send your images to the cloud. Google Photos, Apple Photos, and Amazon Photos all require accounts, upload your data to remote servers, and lock you into their ecosystem.

Archivist-AI runs entirely on your machine. Your photos never leave your computer.

Archivist-AI Google Photos Apple Photos
Works offline โœ… โŒ โŒ
No account needed โœ… โŒ โŒ
Images stay on your machine โœ… โŒ โŒ
Natural language search โœ… โœ… โœ…
Reverse image search โœ… โœ… โŒ
Duplicate detection โœ… โœ… โŒ
Open source โœ… โŒ โŒ
Works on any folder โœ… โŒ โŒ

โœจ Features

  • ๐Ÿ” Natural language search โ€” Type "birthday cake with candles" or "sunset over mountains" and find the right photo instantly. Powered by SigLIP (Google's state-of-the-art vision-language model).
  • ๐Ÿ–ผ๏ธ Reverse image search โ€” Drag in any image to find visually similar photos in your library.
  • ๐Ÿ”Ž Duplicate detection โ€” Finds near-duplicate images using perceptual similarity โ€” catches re-encoded, cropped, or slightly edited copies.
  • ๐Ÿท๏ธ Zero-shot auto-tagging โ€” Automatically tag images using natural categories (portrait, sunset, dog, indoor) with no training required.
  • ๐Ÿ“ Smart organiser โ€” Copy or move search results to a new folder, or rename them by query.
  • ๐Ÿ‘๏ธ Folder watcher โ€” Monitor directories and auto-index new images in real time.
  • ๐Ÿ“… Date filtering โ€” Filter searches by EXIF date or file modification date.
  • โšก ONNX acceleration โ€” Export the model to ONNX for 3โ€“5ร— faster CPU inference.
  • ๐Ÿ–ฅ๏ธ Gradio web UI โ€” A clean local browser interface for all features.
  • โŒจ๏ธ Full CLI โ€” Scriptable, composable, pipe-friendly.

๐Ÿš€ Quick Start

1. Install

pip install archivist-ai

Requirements: Python 3.9+. No GPU needed.

2. Index your photos

archivist index ~/Pictures

The first run downloads the SigLIP model (~375 MB, once). Subsequent runs only process new images.

3. Search

archivist search "people laughing at a dinner table"

4. Launch the web UI

archivist ui

Open http://127.0.0.1:7860 in your browser.


๐Ÿ–ฅ๏ธ Web UI

Launch with archivist ui and get a full-featured browser interface:

Tab What it does
๐Ÿ” Text Search Natural language search with similarity threshold and date filters
๐Ÿ–ผ๏ธ Reverse Image Search Upload any image to find visually similar ones
๐Ÿ”Ž Find Duplicates Scan for near-duplicates and delete extras with one click
๐Ÿ“ Index Folder Add a new folder to the index from the browser
๐Ÿ“Š Stats Index size, date range, storage breakdown

โŒจ๏ธ CLI Reference

archivist index <dirs...>        Index image directories (incremental)
archivist search <query>         Natural language search
archivist similar <image>        Reverse image search
archivist dupes                  Find near-duplicate images
archivist tag                    Auto-tag all untagged images
archivist copy <query> <dest>    Copy search results to a folder
archivist watch <dirs...>        Watch folders and auto-index new arrivals
archivist clean                  Remove stale entries for deleted files
archivist stats                  Show index statistics
archivist export-onnx            Export model to ONNX (3โ€“5ร— faster)
archivist ui                     Launch the Gradio web UI

Examples:

# Search with stricter threshold and more results
archivist search "cats playing" --top-k 50 --threshold 0.3

# Index multiple folders, non-recursive
archivist index ~/Photos ~/Downloads --no-recursive

# Find only near-identical duplicates
archivist dupes --threshold 0.99

# Preview what would be copied without doing it
archivist copy "wedding photos" ~/Desktop/Wedding --dry-run

# Watch a folder and auto-index as new photos arrive
archivist watch ~/Downloads

โšก Speed: ONNX Mode

For significantly faster indexing and search on CPU:

# Export the model once (takes ~1 minute)
archivist export-onnx

# All subsequent commands use ONNX automatically
archivist search "golden retriever"

ONNX mode enables int8 quantization and skips PyTorch entirely at inference time.

Mode ~Time per image
SigLIP (PyTorch, default) ~0.30 s
SigLIP + quantization ~0.15 s
ONNX (after export) ~0.06โ€“0.10 s

๐Ÿ”ง How It Works

Your Photos
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Indexer                                โ”‚
โ”‚  โ€ข SHA-256 dedup (skip unchanged files) โ”‚
โ”‚  โ€ข EXIF date extraction                 โ”‚
โ”‚  โ€ข SigLIP / ONNX embedding             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚  768-dim float32 vector
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  FAISS Index    โ”‚  โ† vector similarity search
        โ”‚  (IndexFlatIP)  โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  SQLite DB      โ”‚  โ† file path, hash, tags, date
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Query                        โ”‚
    โ”‚  "people in suits" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚โ”€โ”€โ–ถ text embedding โ†’ FAISS โ†’ ranked results
    โ”‚  query_image.jpg โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚โ”€โ”€โ–ถ image embedding โ†’ FAISS โ†’ ranked results
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Why SigLIP over CLIP? SigLIP uses a sigmoid loss instead of softmax, making it significantly better at zero-shot retrieval โ€” especially for complex or multi-concept queries. It's the model powering Google Lens.

Why FAISS? Facebook's FAISS performs exact inner-product search in milliseconds even across 100,000+ images, with no server required.

Incremental indexing: Files are identified by SHA-256 hash. Re-running archivist index on the same folder is near-instant โ€” only new or changed files are embedded.


๐Ÿ“ฆ Installation Options

Stable (pip):

pip install archivist-ai

With ONNX acceleration:

pip install "archivist-ai[onnx]"

From source:

git clone https://github.com/abdullahkousa2/archivist-ai
cd archivist-ai
pip install -e ".[dev]"

โš™๏ธ Configuration

The config file lives at ~/.archivist/config.json and is created automatically on first run.

{
  "model_id": "google/siglip-base-patch16-224",
  "device": "cpu",
  "quantize": true,
  "use_onnx": false,
  "batch_size": 16,
  "top_k": 20,
  "duplicate_threshold": 0.97,
  "autotag_on_index": false
}
Key Default Description
model_id google/siglip-base-patch16-224 Vision-language model
quantize true Dynamic int8 quantization (faster, no quality loss)
use_onnx false Use ONNX runtime (run export-onnx first)
batch_size 16 Images per embedding batch
duplicate_threshold 0.97 Cosine similarity cutoff for duplicates
autotag_on_index false Auto-tag every image during indexing (slower)

๐Ÿ—บ๏ธ Roadmap

  • OCR search โ€” find images containing specific text
  • Face clustering โ€” group photos by person (fully local)
  • Smart albums โ€” saved searches that auto-update
  • Metadata editing โ€” write tags back to EXIF
  • Plugin API โ€” bring your own embedder
  • Desktop app (Electron/Tauri wrapper)

๐Ÿค Contributing

Contributions are very welcome. See CONTRIBUTING.md to get started.

git clone https://github.com/abdullahkousa2/archivist-ai
cd archivist-ai
pip install -e ".[dev]"
pytest tests/

Please open an issue before submitting large PRs so we can discuss the approach first.


๐Ÿ“„ License

MIT ยฉ 2025 โ€” see LICENSE for details.


If Archivist-AI is useful to you, a โญ on GitHub goes a long way.

Built for people who believe their photos belong to them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archivist_ai-0.1.0.tar.gz (51.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archivist_ai-0.1.0-py3-none-any.whl (52.3 kB view details)

Uploaded Python 3

File details

Details for the file archivist_ai-0.1.0.tar.gz.

File metadata

  • Download URL: archivist_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 51.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for archivist_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 de1eb1d83057f901246e677dd12e1555ae5175ea111775ceda3acb3eeff03cfc
MD5 16b1d8bab23ba1d5cec4a9614cea91bc
BLAKE2b-256 f58039adf3b2077c351a05c5f69e8ee7a4532cf9dd21f37aee36d80aff31ec71

See more details on using hashes here.

File details

Details for the file archivist_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: archivist_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 52.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for archivist_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 170b1090d22388893d5c77b4c7efd1889a1dd1d5c5e95585a270cc803d33d9c4
MD5 f78c99c17596d98a98c0f118b8212c2b
BLAKE2b-256 9f4dcdb933dec63a441fb7012dc01a7eda4abde327ec396e94626265c9edda8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page