Skip to main content

Sobel Gradient Image Deduplication

Project description

GraDupe Icon
GraDupe

Sobel Gradient Image Deduplication

Usage

Get the CLI tool with pip install -U gradupe, or retrieve from PyPI manually.

gradupe init initializes cache in the current directory for long-term management.
gradupe scan scans the current directory for duplicates, utilizing cache if available.

For further information and options, refer to gradupe and gradupe [command] --help.

Motive

Classical algorithms based on image hashes can be inaccurate. Innovative ones based on RNNs can be inefficient. As the demand for image storage increases rapidly over the decade, we need a prompt solution that combines the benefits of both.

At one point, Sobel gradients occurred to me as a decent fingerprint for an image. Similar to finite differences and derivatives, two distinct images bear the same gradient only if they differ by a constant. By reading an image in grayscale, we obtain a 2D matrix suitable for Sobel operators.

Images of different dimensions are downscaled into a square grid. Although convolutions are blazingly fast on modern hardware, this is done to unify dimensions and speed up diffing. After downscaling, there remains a sufficient amount of informative bits for diffing in the next step.

Sobel operators are traditionally used for edge detection, but their nature lies in differentiating an image. Computing the Sobel gradient of an image in both the x and y directions yields two matrices, which we flatten and concatenate into a contiguous array.

The gradients are thresholded into bitmasks since Hamming distance can be optimized using SIMD XOR instructions, making it magnitudes faster than Euclidean norm. By mapping sub-indices of pairs into combinatorial indices, a densely packed array can be used as a distance matrix, saving memory and enabling parallel computation.

The single flat distance array can be thresholded into a boolean mask with SIMD instructions. All that remains is to compress the image combinations with the mask (combinatorial indexing ensures correct correspondence), resulting in a list of duplicate pairs which is then merged into groups via union find.

Credits

Library
OpenCV, NumPy, Numba

Cache
SQLite

CLI
Typer, Rich

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gradupe-3.0.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gradupe-3.0.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file gradupe-3.0.0.tar.gz.

File metadata

  • Download URL: gradupe-3.0.0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gradupe-3.0.0.tar.gz
Algorithm Hash digest
SHA256 7470100275008ce1c3d128a9f6d1372f205be187490cd1f28eedd23d9dc74bd0
MD5 d6fb648405b9afa10f614eb35e69812b
BLAKE2b-256 8fae2b6478288d8d4fb36497a7eb1ee4bf6b1c5cde2a0aa24b458fddd5fbc585

See more details on using hashes here.

Provenance

The following attestation bundles were made for gradupe-3.0.0.tar.gz:

Publisher: publish.yml on wavim/gradupe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gradupe-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: gradupe-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gradupe-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5927b59db30bb6b8ff99208f3c63a31bbbabef4b9e35f75d17ef2fecc2ab2d7f
MD5 393bcfbcf99b7acaec2746973cb31d2f
BLAKE2b-256 68ad1deb95f6e7b1cdca6b09d6fbd4a2363ff9e5cc3752f327d322591a2d068e

See more details on using hashes here.

Provenance

The following attestation bundles were made for gradupe-3.0.0-py3-none-any.whl:

Publisher: publish.yml on wavim/gradupe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page