Video deduplicator utility for Hydrus Network
Project description
Hydrus Video Deduplicator
Hydrus Video Deduplicator detects similar video files and marks them as potential duplicates through the Hydrus API
How It Works:
The deduplicator works by comparing videos by computing a perceptual hash.
A perceptual hash is a way to characterize videos in small chunks.
The perceptual hashes are stored in a database file in the running directory to avoid computing them every time.
Once all perceptual hashes for all the videos in your database are computed, they are compared against each other to detect if they're similar. If they are similar, they will be marked as potential duplicates in Hydrus.
The accuracy is extremely good because of vpdq. You can adjust the threshold of similarity using --threshold
. The default is 75%.
For more information check out the wiki and the FAQ
Installation:
Linux:
Then install with pip:
pip install hydrusvideodeduplicator
Usage:
python3 -m hydrusvideodeduplicator --api-key="<your key>"
TODO:
- Option to rollback and remove potential duplicates
- Option to enter custom Hydrus tag search parameters
- Parallelize hashing and duplicate search
- Automatically generate access key with Hydrus API
- Upload to PyPI
- Docker container
- Windows compatibility without WSL or Docker
Please create an issue on Github if you have any problems or questions! Pull requests also welcome on this or my VideoHash fork.
There is a lot to improve and cleanup and I'm more experienced in C than Python, so fix stuff please.
Credits:
Hydrus Network by dev
Hydrus API Library by Cryzed
vpdq by Meta
various other files from threatexchange by Meta
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hydrusvideodeduplicator-0.1.19.tar.gz
.
File metadata
- Download URL: hydrusvideodeduplicator-0.1.19.tar.gz
- Upload date:
- Size: 40.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bba6375a46824a919ac0bde10ec193dab879d144a6615a8829391dfa7c7e16b |
|
MD5 | d3ec6a03b61b83ac37c8e499160b1a17 |
|
BLAKE2b-256 | 53b13ec9ed67b9b5c60cdc5f26e25490e14fcccd18851a639ae10c4e6157efe0 |
File details
Details for the file hydrusvideodeduplicator-0.1.19-py3-none-any.whl
.
File metadata
- Download URL: hydrusvideodeduplicator-0.1.19-py3-none-any.whl
- Upload date:
- Size: 43.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddfc18f934b004db13546ee83f8ab4042cb8fd05d8f3e243430205172c93ecfa |
|
MD5 | aeb6cf2c44825506934ce3bebc0edb5b |
|
BLAKE2b-256 | 39a655e648a1da3735d337a786bb14fe69bb563653ad7314487dec25bf810cad |