CLI tool to cluster images by face similarity using InsightFace and Agglomerative Clustering
Project description
Face Cluster CLI
Command-line tool that clusters images by face similarity using InsightFace for detection and Agglomerative Clustering for clustering.
Overview
Face Cluster CLI scans a directory of images, detects every face using the InsightFace antelopev2 model (512-dimensional embeddings), groups them into identity clusters via agglomerative hierarchical clustering with cosine distance, and exports the results into neatly organised subdirectories — one per person.
Features
- InsightFace antelopev2 — state-of-the-art face detection and recognition with automatic model download
- Agglomerative cosine clustering — hierarchical clustering with configurable distance threshold and linkage
- GPU-first inference — ships
onnxruntime-gpuon Linux x86_64 / Windows for automatic NVIDIA CUDA acceleration; falls back to CPU on macOS and other platforms - Rich terminal UX — progress bars, coloured logging, and a summary panel at the end
- Environment variable configuration — every setting overridable via
FACE_CLUSTER_*env vars or.envfile - Corrupted image resilience — unreadable files are logged and skipped without aborting the pipeline
- Interactive overwrite protection — prompts before cleaning an existing output directory (
--forceto skip) - Strict type safety — full
mypy --strictcompliance with Pydantic validation at every boundary
Requirements
| Requirement | Details |
|---|---|
| Python | >= 3.10.19 |
| OS | Linux, macOS, Windows |
| GPU (auto-detected) | NVIDIA GPU with CUDA 12.x drivers (Linux x86_64 / Windows). CPU fallback on macOS and other platforms. |
GPU Support
On Linux x86_64 and Windows, Face Cluster installs onnxruntime-gpu which bundles CUDA 12.x and cuDNN 9.x libraries. If an NVIDIA GPU with compatible drivers is detected, inference runs on GPU automatically. Otherwise, execution falls back to CPU transparently.
On macOS and other platforms (e.g. Linux aarch64), the CPU-only onnxruntime package is installed instead — no NVIDIA GPU wheels exist for these platforms.
Tip: Run with
--verboseto see which ONNX Runtime execution providers are active.
Installation
Run without installing (recommended for one-off use)
uvx face-cluster ./photos
Install as a tool
# With uv
uv tool install face-cluster
# With pip
pip install face-cluster
From source
git clone https://github.com/j-about/Face-Cluster.git
cd Face-Cluster
uv sync
uv run face-cluster --help
Quick Start
face-cluster ./photos --output ./clusters --verbose
This will:
- Scan
./photosfor supported images - Detect all faces and extract 512-d embeddings
- Cluster faces by identity using agglomerative clustering
- Copy images into
./clusters/cluster_000/,cluster_001/, etc. - Print a summary panel:
╭──── Pipeline Summary ────╮
│ Total images 42 │
│ Images with faces 38 │
│ Total faces detected 51 │
│ Clusters 4 │
│ Largest cluster 18 │
│ Outliers 3 │
╰──────────────────────────╯
Usage
face-cluster [OPTIONS] INPUT_DIR
Arguments
| Argument | Description |
|---|---|
INPUT_DIR |
Required. Path to a directory containing images to cluster. Must exist and be readable. |
Options
| Option | Short | Default | Type | Constraint | Description |
|---|---|---|---|---|---|
--output |
-o |
./face_clusters |
PATH |
— | Directory for exported cluster folders. |
--distance-threshold |
— | 0.8 |
FLOAT |
> 0 | Cosine distance threshold above which clusters are not merged. |
--linkage |
— | complete |
TEXT |
average, complete, single |
Linkage criterion: 'average', 'complete', or 'single'. |
--min-cluster-size |
— | 2 |
INT |
>= 2 | Clusters smaller than this are reclassified as outliers. |
--batch-size |
— | 32 |
INT |
>= 1 | Number of images per progress-bar tick. |
--force |
-f |
false |
FLAG |
— | Overwrite output directory without confirmation. |
--verbose |
-v |
false |
FLAG |
— | Enable debug-level logging. |
--help |
— | — | — | — | Show help message and exit. |
Examples
# Basic usage
face-cluster ./photos
# Custom output directory and stricter clustering
face-cluster ./photos -o ./results --distance-threshold 0.5 --linkage complete
# Force overwrite with verbose logging
face-cluster ./photos -o ./results -f -v
# Run from source
uv run face-cluster ./photos --output ./clusters
Output Structure
face_clusters/
├── cluster_000/ # Identity A
│ ├── photo_001.jpg
│ ├── photo_007.jpg
│ └── photo_012.jpg
├── cluster_001/ # Identity B
│ ├── photo_003.jpg
│ └── photo_009.jpg
├── cluster_002/ # Identity C
│ └── ...
└── outliers/ # Faces in clusters too small (< min_cluster_size)
├── photo_022.jpg
└── photo_035.jpg
- Cluster directories use zero-padded labels (
cluster_000,cluster_001, ...). - Images are copied (originals are never modified).
- If an image contains faces belonging to multiple clusters, it is copied into each relevant directory.
- Filename collisions are resolved automatically by appending a counter suffix (
photo_1.jpg,photo_2.jpg, ...).
Configuration
All settings can be overridden via environment variables with the FACE_CLUSTER_ prefix. A .env file in the working directory is also supported.
| Environment Variable | Default | Type | Description |
|---|---|---|---|
FACE_CLUSTER_MODEL_NAME |
antelopev2 |
str |
InsightFace model pack name. |
FACE_CLUSTER_DET_SIZE |
(640, 640) |
tuple[int, int] |
Detection input size (width, height). |
FACE_CLUSTER_BATCH_SIZE |
32 |
int |
Number of images per progress-bar tick. |
FACE_CLUSTER_DISTANCE_THRESHOLD |
0.8 |
float |
Cosine distance threshold above which clusters are not merged (> 0). |
FACE_CLUSTER_LINKAGE |
complete |
str |
Linkage criterion: average, complete, or single. |
FACE_CLUSTER_MIN_CLUSTER_SIZE |
2 |
int |
Clusters smaller than this are reclassified as outliers (>= 2). |
FACE_CLUSTER_OUTPUT_DIR |
./face_clusters |
Path |
Output directory for cluster folders. |
FACE_CLUSTER_FORCE |
false |
bool |
Skip overwrite confirmation. |
FACE_CLUSTER_VERBOSE |
false |
bool |
Enable debug logging. |
Note: CLI options take priority over environment variables.
Validation rule: FACE_CLUSTER_LINKAGE must be one of average, complete, or single.
Supported Image Formats
| Extension |
|---|
.jpg |
.jpeg |
.png |
.bmp |
.webp |
.tiff |
.tif |
Image discovery is non-recursive (only the top-level directory is scanned). Extension matching is case-insensitive.
Exit Codes
| Code | Meaning | Examples |
|---|---|---|
0 |
Success | Pipeline completed normally. |
1 |
User error | Input directory does not exist, contains no images, or user aborted overwrite. |
2 |
System error | Model failed to load, clustering failed, or export I/O error. |
Development
Setup
git clone https://github.com/j-about/Face-Cluster.git
cd Face-Cluster
uv sync
Run Tests
uv run pytest
Type Checking
uv run mypy --strict src/
Run from Source
uv run face-cluster ./photos --verbose
Architecture
INPUT_DIR
|
v
Discover ──> Detect ──> Cluster ──> Export
(pipeline) (detector) (clustering) (exporter)
| | | |
v v v v
Image paths Embedding Cluster Organised
(sorted) records labels directories
Pipeline Steps
- Discover — scan
INPUT_DIRfor supported image files (non-recursive) - Detect — extract 512-d face embeddings via InsightFace antelopev2
- Cluster — L2-normalise embeddings, then run agglomerative clustering with cosine distance
- Export — copy images into per-cluster subdirectories
Module Map
| Module | Responsibility |
|---|---|
cli.py |
Typer entry point, argument parsing, Rich summary panel |
config.py |
Pydantic BaseSettings with FACE_CLUSTER_* env var support |
models.py |
Data models: EmbeddingRecord, ClusterResult, PipelineSummary |
exceptions.py |
Custom exception hierarchy (FaceClusterError base) |
detector.py |
InsightFace lazy singleton, face detection, embedding extraction |
clustering.py |
L2 normalisation, agglomerative clustering |
exporter.py |
Copy images into per-cluster directories with collision handling |
pipeline.py |
Orchestrator connecting all pipeline steps |
logging_setup.py |
Rich RichHandler configuration with shared Console |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file face_cluster-0.1.1.tar.gz.
File metadata
- Download URL: face_cluster-0.1.1.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec7ab4c238ca6997beea340245ddc13574cb53da0a224097ede07848822c57e9
|
|
| MD5 |
fc8f4037b56ce505779816cbb0e084b5
|
|
| BLAKE2b-256 |
dda51d5564ddea6e375c8fc9a35216a59dc66df62215d47c5b47121579e3efad
|
File details
Details for the file face_cluster-0.1.1-py3-none-any.whl.
File metadata
- Download URL: face_cluster-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae19fbc46222e465a02bff9fdc51d677774397279a4bf9ae1706c237ff5bf7e8
|
|
| MD5 |
d756be08f1f41dcdb504872bf1e24522
|
|
| BLAKE2b-256 |
77fc81a6900594ceffbd1d0b229353a66dc33f0f98a3156894b038d593a511f0
|