Skip to main content

CLI tool to cluster images by face similarity using InsightFace and Agglomerative Clustering

Project description

Face Cluster CLI

Python Typing Tests

Command-line tool that clusters images by face similarity using InsightFace for detection and Agglomerative Clustering for clustering.


Overview

Face Cluster CLI scans a directory of images, detects every face using the InsightFace antelopev2 model (512-dimensional embeddings), groups them into identity clusters via agglomerative hierarchical clustering with cosine distance, and exports the results into neatly organised subdirectories — one per person.

Features

  • InsightFace antelopev2 — state-of-the-art face detection and recognition with automatic model download
  • Agglomerative cosine clustering — hierarchical clustering with configurable distance threshold and linkage
  • GPU-first inference — ships onnxruntime-gpu on Linux x86_64 / Windows for automatic NVIDIA CUDA acceleration; falls back to CPU on macOS and other platforms
  • Rich terminal UX — progress bars, coloured logging, and a summary panel at the end
  • Environment variable configuration — every setting overridable via FACE_CLUSTER_* env vars or .env file
  • Corrupted image resilience — unreadable files are logged and skipped without aborting the pipeline
  • Interactive overwrite protection — prompts before cleaning an existing output directory (--force to skip)
  • Strict type safety — full mypy --strict compliance with Pydantic validation at every boundary

Requirements

Requirement Details
Python >= 3.10.19
OS Linux, macOS, Windows
GPU (auto-detected) NVIDIA GPU with CUDA 12.x drivers (Linux x86_64 / Windows). CPU fallback on macOS and other platforms.

GPU Support

On Linux x86_64 and Windows, Face Cluster installs onnxruntime-gpu which bundles CUDA 12.x and cuDNN 9.x libraries. If an NVIDIA GPU with compatible drivers is detected, inference runs on GPU automatically. Otherwise, execution falls back to CPU transparently.

On macOS and other platforms (e.g. Linux aarch64), the CPU-only onnxruntime package is installed instead — no NVIDIA GPU wheels exist for these platforms.

Tip: Run with --verbose to see which ONNX Runtime execution providers are active.

Installation

Run without installing (recommended for one-off use)

uvx face-cluster ./photos

Install as a tool

# With uv
uv tool install face-cluster

# With pip
pip install face-cluster

From source

git clone https://github.com/j-about/Face-Cluster.git
cd Face-Cluster
uv sync
uv run face-cluster --help

Quick Start

face-cluster ./photos --output ./clusters --verbose

This will:

  1. Scan ./photos for supported images
  2. Detect all faces and extract 512-d embeddings
  3. Cluster faces by identity using agglomerative clustering
  4. Copy images into ./clusters/cluster_000/, cluster_001/, etc.
  5. Print a summary panel:
╭──── Pipeline Summary ────╮
│  Total images          42 │
│  Images with faces     38 │
│  Total faces detected  51 │
│  Clusters               4 │
│  Largest cluster       18 │
│  Outliers               3 │
╰──────────────────────────╯

Usage

face-cluster [OPTIONS] INPUT_DIR

Arguments

Argument Description
INPUT_DIR Required. Path to a directory containing images to cluster. Must exist and be readable.

Options

Option Short Default Type Constraint Description
--output -o ./face_clusters PATH Directory for exported cluster folders.
--distance-threshold 0.8 FLOAT > 0 Cosine distance threshold above which clusters are not merged.
--linkage complete TEXT average, complete, single Linkage criterion: 'average', 'complete', or 'single'.
--min-cluster-size 2 INT >= 2 Clusters smaller than this are reclassified as outliers.
--batch-size 32 INT >= 1 Number of images per progress-bar tick.
--force -f false FLAG Overwrite output directory without confirmation.
--verbose -v false FLAG Enable debug-level logging.
--help Show help message and exit.

Examples

# Basic usage
face-cluster ./photos

# Custom output directory and stricter clustering
face-cluster ./photos -o ./results --distance-threshold 0.5 --linkage complete

# Force overwrite with verbose logging
face-cluster ./photos -o ./results -f -v

# Run from source
uv run face-cluster ./photos --output ./clusters

Output Structure

face_clusters/
├── cluster_000/          # Identity A
│   ├── photo_001.jpg
│   ├── photo_007.jpg
│   └── photo_012.jpg
├── cluster_001/          # Identity B
│   ├── photo_003.jpg
│   └── photo_009.jpg
├── cluster_002/          # Identity C
│   └── ...
└── outliers/             # Faces in clusters too small (< min_cluster_size)
    ├── photo_022.jpg
    └── photo_035.jpg
  • Cluster directories use zero-padded labels (cluster_000, cluster_001, ...).
  • Images are copied (originals are never modified).
  • If an image contains faces belonging to multiple clusters, it is copied into each relevant directory.
  • Filename collisions are resolved automatically by appending a counter suffix (photo_1.jpg, photo_2.jpg, ...).

Configuration

All settings can be overridden via environment variables with the FACE_CLUSTER_ prefix. A .env file in the working directory is also supported.

Environment Variable Default Type Description
FACE_CLUSTER_MODEL_NAME antelopev2 str InsightFace model pack name.
FACE_CLUSTER_DET_SIZE (640, 640) tuple[int, int] Detection input size (width, height).
FACE_CLUSTER_BATCH_SIZE 32 int Number of images per progress-bar tick.
FACE_CLUSTER_DISTANCE_THRESHOLD 0.8 float Cosine distance threshold above which clusters are not merged (> 0).
FACE_CLUSTER_LINKAGE complete str Linkage criterion: average, complete, or single.
FACE_CLUSTER_MIN_CLUSTER_SIZE 2 int Clusters smaller than this are reclassified as outliers (>= 2).
FACE_CLUSTER_OUTPUT_DIR ./face_clusters Path Output directory for cluster folders.
FACE_CLUSTER_FORCE false bool Skip overwrite confirmation.
FACE_CLUSTER_VERBOSE false bool Enable debug logging.

Note: CLI options take priority over environment variables.

Validation rule: FACE_CLUSTER_LINKAGE must be one of average, complete, or single.

Supported Image Formats

Extension
.jpg
.jpeg
.png
.bmp
.webp
.tiff
.tif

Image discovery is non-recursive (only the top-level directory is scanned). Extension matching is case-insensitive.

Exit Codes

Code Meaning Examples
0 Success Pipeline completed normally.
1 User error Input directory does not exist, contains no images, or user aborted overwrite.
2 System error Model failed to load, clustering failed, or export I/O error.

Development

Setup

git clone https://github.com/j-about/Face-Cluster.git
cd Face-Cluster
uv sync

Run Tests

uv run pytest

Type Checking

uv run mypy --strict src/

Run from Source

uv run face-cluster ./photos --verbose

Architecture

INPUT_DIR
    |
    v
 Discover ──> Detect ──> Cluster ──> Export
 (pipeline)   (detector)  (clustering) (exporter)
    |             |            |            |
    v             v            v            v
 Image paths   Embedding    Cluster      Organised
 (sorted)      records      labels       directories

Pipeline Steps

  1. Discover — scan INPUT_DIR for supported image files (non-recursive)
  2. Detect — extract 512-d face embeddings via InsightFace antelopev2
  3. Cluster — L2-normalise embeddings, then run agglomerative clustering with cosine distance
  4. Export — copy images into per-cluster subdirectories

Module Map

Module Responsibility
cli.py Typer entry point, argument parsing, Rich summary panel
config.py Pydantic BaseSettings with FACE_CLUSTER_* env var support
models.py Data models: EmbeddingRecord, ClusterResult, PipelineSummary
exceptions.py Custom exception hierarchy (FaceClusterError base)
detector.py InsightFace lazy singleton, face detection, embedding extraction
clustering.py L2 normalisation, agglomerative clustering
exporter.py Copy images into per-cluster directories with collision handling
pipeline.py Orchestrator connecting all pipeline steps
logging_setup.py Rich RichHandler configuration with shared Console

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

face_cluster-0.1.1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

face_cluster-0.1.1-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file face_cluster-0.1.1.tar.gz.

File metadata

  • Download URL: face_cluster-0.1.1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for face_cluster-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ec7ab4c238ca6997beea340245ddc13574cb53da0a224097ede07848822c57e9
MD5 fc8f4037b56ce505779816cbb0e084b5
BLAKE2b-256 dda51d5564ddea6e375c8fc9a35216a59dc66df62215d47c5b47121579e3efad

See more details on using hashes here.

File details

Details for the file face_cluster-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: face_cluster-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for face_cluster-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae19fbc46222e465a02bff9fdc51d677774397279a4bf9ae1706c237ff5bf7e8
MD5 d756be08f1f41dcdb504872bf1e24522
BLAKE2b-256 77fc81a6900594ceffbd1d0b229353a66dc33f0f98a3156894b038d593a511f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page