Skip to main content

Video deduplicator utility for Hydrus Network

Project description

Hydrus Video Deduplicator

Hydrus Video Deduplicator detects similar video files and marks them as potential duplicates through the Hydrus API


How It Works:

The deduplicator works by comparing videos by computing a perceptual hash.

A perceptual hash is a way to characterize videos in small chunks.

The perceptual hashes are stored in a database file in the running directory to avoid computing them every time.

Once all perceptual hashes for all the videos in your database are computed, they are compared against each other to detect if they're similar. If they are similar, they will be marked as potential duplicates in Hydrus.

The accuracy is extremely good because of vpdq. You can adjust the threshold of similarity using --threshold. The default is 75%.

For more information check out the wiki and the FAQ


Installation:

Windows requires WSL

Linux:

Install dependencies

Then install with pip:

pip install hydrusvideodeduplicator

Usage:

python3 -m hydrusvideodeduplicator --api-key="<your key>"

TODO:

  • Option to rollback and remove potential duplicates
  • Option to enter custom Hydrus tag search parameters
  • Parallelize hashing and duplicate search
  • Automatically generate access key with Hydrus API
  • Upload to PyPI
  • Docker container
  • Windows compatibility without WSL or Docker

Please create an issue on Github if you have any problems or questions! Pull requests also welcome on this or my VideoHash fork.

There is a lot to improve and cleanup and I'm more experienced in C than Python, so fix stuff please.


Credits:

Hydrus Network by dev

Hydrus API Library by Cryzed

vpdq by Meta

various other files from threatexchange by Meta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydrusvideodeduplicator-0.1.19.tar.gz (40.4 kB view details)

Uploaded Source

Built Distribution

hydrusvideodeduplicator-0.1.19-py3-none-any.whl (43.2 kB view details)

Uploaded Python 3

File details

Details for the file hydrusvideodeduplicator-0.1.19.tar.gz.

File metadata

File hashes

Hashes for hydrusvideodeduplicator-0.1.19.tar.gz
Algorithm Hash digest
SHA256 1bba6375a46824a919ac0bde10ec193dab879d144a6615a8829391dfa7c7e16b
MD5 d3ec6a03b61b83ac37c8e499160b1a17
BLAKE2b-256 53b13ec9ed67b9b5c60cdc5f26e25490e14fcccd18851a639ae10c4e6157efe0

See more details on using hashes here.

File details

Details for the file hydrusvideodeduplicator-0.1.19-py3-none-any.whl.

File metadata

File hashes

Hashes for hydrusvideodeduplicator-0.1.19-py3-none-any.whl
Algorithm Hash digest
SHA256 ddfc18f934b004db13546ee83f8ab4042cb8fd05d8f3e243430205172c93ecfa
MD5 aeb6cf2c44825506934ce3bebc0edb5b
BLAKE2b-256 39a655e648a1da3735d337a786bb14fe69bb563653ad7314487dec25bf810cad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page