Approximate image retrieval using ORB descriptors and FAISS IVF indexing.

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

semajyllek

These details have not been verified by PyPI

Project description

imret

imret is a C++ image retrieval library. Given a query image, it finds the closest matching image from a previously ingested collection. It uses ORB binary feature descriptors and a FAISS inverted-file index (IVF) for approximate nearest-neighbour search, with a two-tier search strategy and per-keypoint voting to produce a confidence score.

How it works

Feature extraction — ORB extracts up to max_features keypoints per image, each producing a 256-bit (32-byte) binary descriptor.
Indexing — build() runs k-means on all accumulated descriptors to partition them into Voronoi cells (IndexBinaryIVF). The number of cells scales with the total feature count.
Search — A query image is described with ORB. Each descriptor votes for the image it matches (filtered by Hamming distance). Tier 1 searches fast_cells cells; if the top vote fraction falls below confidence_threshold, Tier 2 searches deep_cells cells.
Result — The image with the most votes wins, returning its label and a confidence value in [0, 1].

Dependencies

CMake >= 3.15
C++17 compiler
OpenCV 4.x
FAISS
OpenMP (macOS: brew install libomp)

Building (C++)

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel

This produces build/imret_cli and build/libimret_core.a.

Running the tests

C++ tests:

ctest --test-dir build --output-on-failure

Python binding tests (requires imret installed and pytest):

pip install pytest
python -m pytest bindings/python/tests/ -v

C++ API

Include vault.hpp and link against libimret_core.a and its dependencies (OpenCV, faiss, omp).

OrbConfig

#include "imret.hpp"

OrbConfig cfg;
cfg.max_features         = 500;   // ORB keypoints per image
cfg.resize_dim           = 0;     // 0 = no resize; >0 = resize to (N x N) before extraction
cfg.fast_cells           = 8;     // IVF cells probed in tier-1 search
cfg.deep_cells           = 64;    // IVF cells probed in tier-2 fallback
cfg.max_hamming_distance = 45;    // maximum Hamming distance for a keypoint to count as a match
cfg.confidence_threshold = 0.15f; // confidence below this triggers the tier-2 fallback

Ingest, build, search

#include "vault.hpp"

OrbConfig cfg;
Vault vault(cfg);

// Ingest images (grayscale cv::Mat)
vault.add(image_a, "label_a");
vault.add(image_b, "label_b");

// Bulk ingest with OpenMP parallelism — preferred for large collections
vault.add_batch({image_a, image_b, image_c}, {"label_a", "label_b", "label_c"});

// Build the index — required before searching
vault.build();

// Query
MatchResult result = vault.search(query_image);
// result.label        — label of the best match, or "Unknown"
// result.confidence   — fraction of keypoints that voted for the winner [0, 1]
// result.fallback_used — true if the tier-2 search was triggered

add() and add_batch() can be called after build(). Call build() again afterwards to retrain the index over all accumulated data.

Stats

Vault::Stats s = vault.stats();
// s.n_images    — number of unique images in the vault
// s.n_features  — total feature vectors accumulated
// s.nlist       — number of IVF clusters (0 if not yet built)
// s.is_built    — whether the index has been trained

All fields are O(1) reads from in-memory structures.

Persistence

vault.save("/path/to/prefix");   // writes prefix.faiss and prefix.meta
vault.load("/path/to/prefix");   // restores the index and label map; no rebuild needed

The .meta file stores the OrbConfig alongside the label map, so a loaded vault always uses the config it was originally built with.

Python

Install

pip install imret

Pre-built binary wheels are available for Linux x86_64 and macOS arm64, covering Python 3.9–3.13. Google Colab is supported without any additional setup.

Usage

import cv2
import imret

cfg = imret.OrbConfig()
cfg.max_features         = 500
cfg.resize_dim           = 800   # resize images to 800x800 before extraction
cfg.fast_cells           = 8
cfg.deep_cells           = 64
cfg.max_hamming_distance = 45
cfg.confidence_threshold = 0.15

vault = imret.Vault(cfg)

# Ingest — images must be grayscale uint8 numpy arrays
gray = cv2.imread("painting.jpg", cv2.IMREAD_GRAYSCALE)
vault.add(gray, "my_label")

# Bulk ingest (parallel via OpenMP)
vault.add_batch([gray_a, gray_b, gray_c], ["label_a", "label_b", "label_c"])

# Build the index
vault.build()

# Search
result = vault.search(query_gray)
print(result.label, result.confidence, result.fallback_used)

# Vault stats (all O(1))
s = vault.stats()
print(s["n_images"])    # images in the vault
print(s["n_features"])  # total ORB descriptor vectors stored
print(s["nlist"])       # IVF Voronoi clusters (0 before build)
print(s["is_built"])    # whether the index has been trained

# Save and load
vault.save("/tmp/my_vault")
vault2 = imret.Vault.load_from_disk("/tmp/my_vault", cfg)

add() expects a 2-D numpy.ndarray with dtype uint8. If resize_dim > 0, resizing is applied internally before extraction.

Building from source

Requirements: Python >= 3.9, pybind11, scikit-build-core, OpenCV, FAISS, OpenMP.

On macOS, install OpenMP first:

brew install libomp

Then build and install the Python package:

cd bindings/python
pip install scikit-build-core pybind11
pip install .

The CMakeLists detects the Homebrew libomp prefix automatically.

CLI

./build/imret_cli <vault_prefix> <path_to_image>

Loads the vault at <vault_prefix>.faiss / <vault_prefix>.meta, searches with the given image, and prints the matched label to stdout. Exits with code 1 and prints UNKNOWN if confidence is below confidence_threshold.

OrbConfig reference

Field	Default	Description
`max_features`	500	Maximum ORB keypoints extracted per image
`resize_dim`	0	If > 0, resize each image to `resize_dim x resize_dim` before extraction
`fast_cells`	8	IVF cells probed during tier-1 search
`deep_cells`	64	IVF cells probed during tier-2 fallback
`max_hamming_distance`	45	Keypoints with Hamming distance above this threshold are excluded from voting
`confidence_threshold`	0.15	Vote fraction below this triggers the tier-2 fallback

Benchmarks

Generated by benchmark.py on macOS arm64, max_features=500, 20 queries per size.

imret and imret-batch differ only at ingest: the batch variant uses add_batch() which extracts ORB features in parallel via OpenMP, reducing build time by up to 37% at large N. Search latency is identical.

Speed comparison — synthetic images, no transform

All methods score 100% accuracy on structurally distinct synthetic images, so this measures speed only.

Search latency p50 (ms)

N	imret	imret-batch	bfmatcher	imagehash
100	3.19	3.11	8.60	0.20
500	3.26	3.27	39.97	0.52
1,000	3.30	3.33	85.14	0.90
2,000	3.36	3.33	166.58	1.67

Build time (s)

N	imret	imret-batch	bfmatcher	imagehash
100	1.11	0.98	0.11	0.44
500	15.64	15.12	0.56	0.07
1,000	31.19	30.18	1.11	0.17
2,000	62.20	60.27	2.25	0.33

imret search latency is nearly flat (3.19ms → 3.36ms across 20× more images). BFMatcher grows linearly. Build time is higher because imret trains a FAISS IVF index upfront — a one-time cost at index time.

Accuracy comparison — WikiArt paintings, wall-photo transform

The wall transform adds a grey background border and light Gaussian blur, simulating a gallery photo. This is where the methods diverge. Perceptual hashing (imagehash) is designed for near-duplicate detection; adding a border shifts the global DCT hash enough to confuse it with unrelated paintings. imret's ORB keypoints are local to the painting interior and are unaffected.

Accuracy (%)

N	imret	imret-batch	bfmatcher	imagehash
100	100.0	100.0	100.0	15.0
500	100.0	100.0	100.0	15.0
1,000	100.0	100.0	100.0	10.0

Search latency p50 (ms)

N	imret	imret-batch	bfmatcher	imagehash
100	21.64	21.54	29.74	1.69
500	22.53	22.27	71.64	2.07
1,000	22.21	22.33	130.68	2.42

imagehash is fast but effectively random on real images under this transform (10–15% ≈ 1-in-10 chance). imret maintains 100% accuracy. BFMatcher matches imret accuracy but latency grows linearly.

To reproduce:

python benchmark.py --dataset wikiart --transform wall --sizes 100,500,1000 --plot
python benchmark.py --sizes 100,500,1000,2000 --plot   # synthetic speed comparison

Publishing a release

Wheels for Linux x86_64 and macOS arm64 are built automatically via GitHub Actions using cibuildwheel. To publish a new version to PyPI:

Update version in pyproject.toml.
Push a version tag:
```
git tag v0.1.0
git push origin v0.1.0
```
The workflow builds wheels for all supported platforms and Python versions, then uploads to PyPI using trusted publishing (no API token required — configure the PyPI project to trust this repository's Actions environment named pypi).

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

semajyllek

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Jun 24, 2026

This version

0.1.0

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

imret-0.1.0-cp310-cp310-macosx_15_0_arm64.whl (80.3 MB view details)

Uploaded Jun 24, 2026 CPython 3.10macOS 15.0+ ARM64

File details

Details for the file imret-0.1.0-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

Download URL: imret-0.1.0-cp310-cp310-macosx_15_0_arm64.whl
Upload date: Jun 24, 2026
Size: 80.3 MB
Tags: CPython 3.10, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for imret-0.1.0-cp310-cp310-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`6b819c7300a6a38c579dad135349f35531493e894baba8cb309194875b4cd9f0`
MD5	`3e9ea28075f12efbbef02c9839a52709`
BLAKE2b-256	`c92314440ce4b0e832c4f036ed34afd078a424612e7eb64d1cde4a36f85c6dba`

See more details on using hashes here.

Provenance

The following attestation bundles were made for imret-0.1.0-cp310-cp310-macosx_15_0_arm64.whl:

Publisher: release.yml on semajyllek/imret

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: imret-0.1.0-cp310-cp310-macosx_15_0_arm64.whl
- Subject digest: 6b819c7300a6a38c579dad135349f35531493e894baba8cb309194875b4cd9f0
- Sigstore transparency entry: 1933729475
- Sigstore integration time: Jun 24, 2026
Source repository:
- Permalink: semajyllek/imret@d854370985102e017c878b9a1edcf16ae41b34c7
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/semajyllek
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d854370985102e017c878b9a1edcf16ae41b34c7
- Trigger Event: push

imret 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

imret

How it works

Dependencies

Building (C++)

Running the tests

C++ API

OrbConfig

Ingest, build, search

Stats

Persistence

Python

Install

Usage

Building from source

CLI

OrbConfig reference

Benchmarks

Speed comparison — synthetic images, no transform

Accuracy comparison — WikiArt paintings, wall-photo transform

Publishing a release

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance