Skip to main content

Approximate image retrieval using ORB descriptors and FAISS IVF indexing.

Project description

imret

imret is a C++ image retrieval library. Given a query image, it finds the closest matching image from a previously ingested collection. It uses ORB binary feature descriptors and a FAISS inverted-file index (IVF) for approximate nearest-neighbour search, with a two-tier search strategy and per-keypoint voting to produce a confidence score.

How it works

  1. Feature extraction — ORB extracts up to max_features keypoints per image, each producing a 256-bit (32-byte) binary descriptor.
  2. Indexingbuild() runs k-means on all accumulated descriptors to partition them into Voronoi cells (IndexBinaryIVF). The number of cells scales with the total feature count.
  3. Search — A query image is described with ORB. Each descriptor votes for the image it matches (filtered by Hamming distance). Tier 1 searches fast_cells cells; if the top vote fraction falls below confidence_threshold, Tier 2 searches deep_cells cells.
  4. Result — The image with the most votes wins, returning its label and a confidence value in [0, 1].

Dependencies

  • CMake >= 3.15
  • C++17 compiler
  • OpenCV 4.x
  • FAISS
  • OpenMP (macOS: brew install libomp)

Building (C++)

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel

This produces build/imret_cli and build/libimret_core.a.

Running the tests

C++ tests:

ctest --test-dir build --output-on-failure

Python binding tests (requires imret installed and pytest):

pip install pytest
python -m pytest bindings/python/tests/ -v

C++ API

Include vault.hpp and link against libimret_core.a and its dependencies (OpenCV, faiss, omp).

OrbConfig

#include "imret.hpp"

OrbConfig cfg;
cfg.max_features         = 500;   // ORB keypoints per image
cfg.resize_dim           = 0;     // 0 = no resize; >0 = resize to (N x N) before extraction
cfg.fast_cells           = 8;     // IVF cells probed in tier-1 search
cfg.deep_cells           = 64;    // IVF cells probed in tier-2 fallback
cfg.max_hamming_distance = 45;    // maximum Hamming distance for a keypoint to count as a match
cfg.confidence_threshold = 0.15f; // confidence below this triggers the tier-2 fallback

Ingest, build, search

#include "vault.hpp"

OrbConfig cfg;
Vault vault(cfg);

// Ingest images (grayscale cv::Mat)
vault.add(image_a, "label_a");
vault.add(image_b, "label_b");

// Bulk ingest with OpenMP parallelism — preferred for large collections
vault.add_batch({image_a, image_b, image_c}, {"label_a", "label_b", "label_c"});

// Build the index — required before searching
vault.build();

// Query
MatchResult result = vault.search(query_image);
// result.label        — label of the best match, or "Unknown"
// result.confidence   — fraction of keypoints that voted for the winner [0, 1]
// result.fallback_used — true if the tier-2 search was triggered

add() and add_batch() can be called after build(). Call build() again afterwards to retrain the index over all accumulated data.

Stats

Vault::Stats s = vault.stats();
// s.n_images    — number of unique images in the vault
// s.n_features  — total feature vectors accumulated
// s.nlist       — number of IVF clusters (0 if not yet built)
// s.is_built    — whether the index has been trained

All fields are O(1) reads from in-memory structures.

Persistence

vault.save("/path/to/prefix");   // writes prefix.faiss and prefix.meta
vault.load("/path/to/prefix");   // restores the index and label map; no rebuild needed

The .meta file stores the OrbConfig alongside the label map, so a loaded vault always uses the config it was originally built with.

Python

Install

pip install imret

Pre-built binary wheels are available for Linux x86_64 and macOS arm64, covering Python 3.9–3.13. Google Colab is supported without any additional setup.

Usage

import cv2
import imret

cfg = imret.OrbConfig()
cfg.max_features         = 500
cfg.resize_dim           = 800   # resize images to 800x800 before extraction
cfg.fast_cells           = 8
cfg.deep_cells           = 64
cfg.max_hamming_distance = 45
cfg.confidence_threshold = 0.15

vault = imret.Vault(cfg)

# Ingest — images must be grayscale uint8 numpy arrays
gray = cv2.imread("painting.jpg", cv2.IMREAD_GRAYSCALE)
vault.add(gray, "my_label")

# Bulk ingest (parallel via OpenMP)
vault.add_batch([gray_a, gray_b, gray_c], ["label_a", "label_b", "label_c"])

# Build the index
vault.build()

# Search
result = vault.search(query_gray)
print(result.label, result.confidence, result.fallback_used)

# Vault stats (all O(1))
s = vault.stats()
print(s["n_images"])    # images in the vault
print(s["n_features"])  # total ORB descriptor vectors stored
print(s["nlist"])       # IVF Voronoi clusters (0 before build)
print(s["is_built"])    # whether the index has been trained

# Save and load
vault.save("/tmp/my_vault")
vault2 = imret.Vault.load_from_disk("/tmp/my_vault", cfg)

add() expects a 2-D numpy.ndarray with dtype uint8. If resize_dim > 0, resizing is applied internally before extraction.

Building from source

Requirements: Python >= 3.9, pybind11, scikit-build-core, OpenCV, FAISS, OpenMP.

On macOS, install OpenMP first:

brew install libomp

Then build and install the Python package:

cd bindings/python
pip install scikit-build-core pybind11
pip install .

The CMakeLists detects the Homebrew libomp prefix automatically.

CLI

./build/imret_cli <vault_prefix> <path_to_image>

Loads the vault at <vault_prefix>.faiss / <vault_prefix>.meta, searches with the given image, and prints the matched label to stdout. Exits with code 1 and prints UNKNOWN if confidence is below confidence_threshold.

OrbConfig reference

Field Default Description
max_features 500 Maximum ORB keypoints extracted per image
resize_dim 0 If > 0, resize each image to resize_dim x resize_dim before extraction
fast_cells 8 IVF cells probed during tier-1 search
deep_cells 64 IVF cells probed during tier-2 fallback
max_hamming_distance 45 Keypoints with Hamming distance above this threshold are excluded from voting
confidence_threshold 0.15 Vote fraction below this triggers the tier-2 fallback

Benchmarks

Generated by benchmark.py on macOS arm64, max_features=500, 20 queries per size.

imret and imret-batch differ only at ingest: the batch variant uses add_batch() which extracts ORB features in parallel via OpenMP, reducing build time by up to 37% at large N. Search latency is identical.

Speed comparison — synthetic images, no transform

All methods score 100% accuracy on structurally distinct synthetic images, so this measures speed only.

Search latency p50 (ms)

N imret imret-batch bfmatcher imagehash
100 3.19 3.11 8.60 0.20
500 3.26 3.27 39.97 0.52
1,000 3.30 3.33 85.14 0.90
2,000 3.36 3.33 166.58 1.67

Build time (s)

N imret imret-batch bfmatcher imagehash
100 1.11 0.98 0.11 0.44
500 15.64 15.12 0.56 0.07
1,000 31.19 30.18 1.11 0.17
2,000 62.20 60.27 2.25 0.33

imret search latency is nearly flat (3.19ms → 3.36ms across 20× more images). BFMatcher grows linearly. Build time is higher because imret trains a FAISS IVF index upfront — a one-time cost at index time.

Accuracy comparison — WikiArt paintings, wall-photo transform

The wall transform adds a grey background border and light Gaussian blur, simulating a gallery photo. This is where the methods diverge. Perceptual hashing (imagehash) is designed for near-duplicate detection; adding a border shifts the global DCT hash enough to confuse it with unrelated paintings. imret's ORB keypoints are local to the painting interior and are unaffected.

Accuracy (%)

N imret imret-batch bfmatcher imagehash
100 100.0 100.0 100.0 15.0
500 100.0 100.0 100.0 15.0
1,000 100.0 100.0 100.0 10.0

Search latency p50 (ms)

N imret imret-batch bfmatcher imagehash
100 21.64 21.54 29.74 1.69
500 22.53 22.27 71.64 2.07
1,000 22.21 22.33 130.68 2.42

imagehash is fast but effectively random on real images under this transform (10–15% ≈ 1-in-10 chance). imret maintains 100% accuracy. BFMatcher matches imret accuracy but latency grows linearly.

To reproduce:

python benchmark.py --dataset wikiart --transform wall --sizes 100,500,1000 --plot
python benchmark.py --sizes 100,500,1000,2000 --plot   # synthetic speed comparison

Publishing a release

Wheels for Linux x86_64 and macOS arm64 are built automatically via GitHub Actions using cibuildwheel. To publish a new version to PyPI:

  1. Update version in pyproject.toml.
  2. Push a version tag:
    git tag v0.1.0
    git push origin v0.1.0
    
  3. The workflow builds wheels for all supported platforms and Python versions, then uploads to PyPI using trusted publishing (no API token required — configure the PyPI project to trust this repository's Actions environment named pypi).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

imret-0.1.1-cp313-cp313-manylinux_2_39_x86_64.whl (40.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

imret-0.1.1-cp313-cp313-macosx_15_0_arm64.whl (14.3 MB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

imret-0.1.1-cp312-cp312-manylinux_2_39_x86_64.whl (40.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

imret-0.1.1-cp312-cp312-macosx_15_0_arm64.whl (14.3 MB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

imret-0.1.1-cp311-cp311-manylinux_2_39_x86_64.whl (40.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

imret-0.1.1-cp311-cp311-macosx_15_0_arm64.whl (14.3 MB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

imret-0.1.1-cp310-cp310-manylinux_2_39_x86_64.whl (40.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

imret-0.1.1-cp310-cp310-macosx_15_0_arm64.whl (14.3 MB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

imret-0.1.1-cp39-cp39-manylinux_2_39_x86_64.whl (40.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.39+ x86-64

imret-0.1.1-cp39-cp39-macosx_15_0_arm64.whl (14.3 MB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

File details

Details for the file imret-0.1.1-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 37f9c058d45966e5f4a42894f5a89f5462597e45c1452af7f5bc775e7945e530
MD5 9a48ff7aa6ba5e94cf24b63681b980bb
BLAKE2b-256 0016a1c04364ba20ff97fa48ed8830cf51e250cf1d934708412a91174a061055

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp313-cp313-manylinux_2_39_x86_64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 88f7d22e892c30dc13e9a6f5dad450380be5e57d164643e6075a8065185bde81
MD5 1ca1da10c8d7021156760fe90c6ac636
BLAKE2b-256 b205065a166eabd011b285d585c9e67c478b6406744e2b81f4f13b3488f543f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp313-cp313-macosx_15_0_arm64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 c572e30bc6bb63df3c2ae38e5a8b33113b30ff73f0e7b0dec42449475ba7844f
MD5 e49a24a0188d4ddabe2a4896cc5bdb8d
BLAKE2b-256 82cec29463c5bf5b5f71099eea0618fe403faf40961a44cd9a1e4c22df026400

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp312-cp312-manylinux_2_39_x86_64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 b00e0d6d5aaa1c3b224c9f83eb1468fdfce239ad8ee7cc7e9128cee75de2a9ec
MD5 64827affea607c9fe4a87bef5a0966a5
BLAKE2b-256 23f72515483a4cbdb74cf0768b3116c67e0b06a40f1638d4f3a4503af79bee49

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp312-cp312-macosx_15_0_arm64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 ab22b3f0312a1659a2907f0c2cb6c5a35619ced7b55449d6f6f364674ae63157
MD5 da83a417ae8b38abb00fbf88569f4684
BLAKE2b-256 9ff94125df0be41c25ac1f86151b301eed9c148b7dc57719e87bcc1724d0d361

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp311-cp311-manylinux_2_39_x86_64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 c9943a21fea176cc369cb3c188cad2d3484eaaf50e562c6d85466f167fc3c086
MD5 aa4ece68cf64f22350e7d5973dd7bce4
BLAKE2b-256 5f8977fd89023bcdb2983d347f74af868a4e4f61701323955eff2f9d5c405756

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp311-cp311-macosx_15_0_arm64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 bfbf9f8bb51b49d795f496c698cf3816f0dc6549159da8e88ad458d89698ffcd
MD5 a93ca8f04f212568ca2370adf9c929f3
BLAKE2b-256 b366803e4e71ce31832830803799dfb6265c8709abb4cd84c274c5ca90f98faa

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp310-cp310-manylinux_2_39_x86_64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 04acc51ef1900978ccdd61818787564ce19bb7ad5d3145633444e1d97ab71af6
MD5 f5f18e5441f71e447d0653560ff48be6
BLAKE2b-256 3a47c927d4375252fc1bd87be203aa388664d22a0b9ec888649c331f999b9dba

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp310-cp310-macosx_15_0_arm64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp39-cp39-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for imret-0.1.1-cp39-cp39-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 f98f8e237fac1d7a7dc8199e24abfb1031cb719c3a4be50db8d3269c423dd58b
MD5 3263e6321e3389daa42ad99c4f22151c
BLAKE2b-256 734b40e1e8d593580469add45439228a3bce53b989d407249fcb1b7da9cdc3a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp39-cp39-manylinux_2_39_x86_64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file imret-0.1.1-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

  • Download URL: imret-0.1.1-cp39-cp39-macosx_15_0_arm64.whl
  • Upload date:
  • Size: 14.3 MB
  • Tags: CPython 3.9, macOS 15.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for imret-0.1.1-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 d41ab8918eca0bb0af7eecced98a5af3d570d4aba652a9934194f6d19fbc069e
MD5 b1eba99eafa9e52f8276c7eaaf9da236
BLAKE2b-256 47cd385057b8fae6af59e1cfc5399444a34058e0cacd46c542524cd5e533c348

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.1-cp39-cp39-macosx_15_0_arm64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page