Sobel Gradient Image Deduplication
Project description

GraDupe
Sobel Gradient Image Deduplication
Usage
Get the CLI tool with pip install -U gradupe, or retrieve from PyPI manually.
It is recommended to run the CLI with admin privilege.
gradupe init initializes cache in the current directory for long-term
management.
gradupe scan scans the current directory for duplicates, utilizing cache if
available.
For further information and options, refer to gradupe and
gradupe [command] --help. Cache is stored in the .gradupe SQLite database in
the current directory if enabled.
Motive
Classical algorithms based on image hashes can be inaccurate. Innovative ones based on RNNs can be inefficient. As the demand for image storage increases rapidly over the decade, we need a prompt solution that combines the benefits of both.
At one point, Sobel gradients occurred to me as a decent fingerprint for an image. Similar to finite differences and derivatives, two distinct images bear the same gradient only if they differ by a constant. By reading an image in grayscale, we obtain a 2D matrix suitable for Sobel operators.
Images of different dimensions are downscaled into a square grid. Although convolutions are blazingly fast on modern hardware, this is done to unify dimensions and speed up diffing. After downscaling, there remains a sufficient amount of informative bits for diffing in the next step.
Sobel operators are traditionally used for edge detection, but their nature lies in differentiating an image. Computing the Sobel gradient of an image in both the x and y directions yields two matrices, which we flatten and concatenate into a contiguous array.
The gradients are thresholded into bitmasks since Hamming distance can be optimized using SIMD XOR instructions, making it magnitudes faster than Euclidean norm. By mapping sub-indices of pairs into combinatorial indices, a densely packed array can be used as a distance matrix, saving memory and enabling parallel computation.
The single flat distance array can be thresholded into a boolean mask with SIMD instructions. All that remains is to compress the image combinations with the mask (combinatorial indexing ensures correct correspondence), resulting in a list of duplicate pairs which is then merged into groups via union find.
Credits
Cache
SQLite
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gradupe-3.0.2.tar.gz.
File metadata
- Download URL: gradupe-3.0.2.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f3af19587ef06c29642c3312a7c8bc6511c9f083e8ae8d42480b9609821f567
|
|
| MD5 |
7e1fe7e5e5dd16b5cfc04671df918575
|
|
| BLAKE2b-256 |
1197385882081c11bce89376ad58ad90c6831b7f40fb4d115efe515ccd0eb3e1
|
Provenance
The following attestation bundles were made for gradupe-3.0.2.tar.gz:
Publisher:
publish.yml on wavim/gradupe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gradupe-3.0.2.tar.gz -
Subject digest:
0f3af19587ef06c29642c3312a7c8bc6511c9f083e8ae8d42480b9609821f567 - Sigstore transparency entry: 833559877
- Sigstore integration time:
-
Permalink:
wavim/gradupe@1e2e7c446138d0a1f26def93f1029db1dc24bfb9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/wavim
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1e2e7c446138d0a1f26def93f1029db1dc24bfb9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gradupe-3.0.2-py3-none-any.whl.
File metadata
- Download URL: gradupe-3.0.2-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11a30e75e0563271e2db51607545480a2fb7b4b496dad8f4e535b24f8b919964
|
|
| MD5 |
d00328cdc4c26432c01cf3ccdd8bdd8b
|
|
| BLAKE2b-256 |
24e049a59abd8a6a8894fe21cf021be11baa48a38a1a716bdb23dfdfb6edb086
|
Provenance
The following attestation bundles were made for gradupe-3.0.2-py3-none-any.whl:
Publisher:
publish.yml on wavim/gradupe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gradupe-3.0.2-py3-none-any.whl -
Subject digest:
11a30e75e0563271e2db51607545480a2fb7b4b496dad8f4e535b24f8b919964 - Sigstore transparency entry: 833559878
- Sigstore integration time:
-
Permalink:
wavim/gradupe@1e2e7c446138d0a1f26def93f1029db1dc24bfb9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/wavim
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1e2e7c446138d0a1f26def93f1029db1dc24bfb9 -
Trigger Event:
push
-
Statement type: