Skip to main content

Video deduplicator utility for Hydrus Network

Project description

Hydrus Video Deduplicator

Hydrus Video Deduplicator detects similar video files and marks them as potential duplicates through the Hydrus API


How It Works:

The deduplicator works by comparing videos by computing a perceptual hash.

A perceptual hash is a way to characterize videos in small chunks.

The perceptual hashes are stored in a database file in the running directory to avoid computing them every time.

Once all perceptual hashes for all the videos in your database are computed, they are compared against each other to detect if they're similar. If they are similar, they will be marked as potential duplicates in Hydrus.

The accuracy is extremely good because of vpdq. You can adjust the threshold of similarity using --threshold. The default is 75%.

For more information check out the wiki and the FAQ


Installation:

Dependencies:

  • Python >=3.10
  • FFmpeg
python3 -m pip install hydrusvideodeduplicator

Usage:

python3 -m hydrusvideodeduplicator --api-key="<your key>"

TODO:

  • Option to rollback and remove potential duplicates
  • OR predicates for --query
  • Parallelize hashing and duplicate search
  • Automatically generate access key with Hydrus API
  • Docker container
  • Upload Docker container to Docker Hub (GitHub Action)
  • Pure Python port of vpdq
  • Windows compatibility without WSL or Docker

Please create an issue on Github if you have any problems or questions! Pull requests are also welcome.


Credits:

Hydrus Network by dev

Hydrus API Library by Cryzed

pdq by Meta

vpdq by Meta, ported to Python by me.

Big Buck Bunny clips by Blender Foundation (CC BY 3.0)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydrusvideodeduplicator-0.2.2.tar.gz (46.4 kB view details)

Uploaded Source

Built Distribution

hydrusvideodeduplicator-0.2.2-py3-none-any.whl (52.6 kB view details)

Uploaded Python 3

File details

Details for the file hydrusvideodeduplicator-0.2.2.tar.gz.

File metadata

File hashes

Hashes for hydrusvideodeduplicator-0.2.2.tar.gz
Algorithm Hash digest
SHA256 7f74dc21f050f96ae4a7022f274eaf9afe5a11ca02765da381ce4d3708b89a7d
MD5 6f1f217435ac66dbc5dfe0d9f3ba6067
BLAKE2b-256 7c21cd408e3e7b736b8d1b3288c32ea3b5bcdb1613f54a632e4339331c01341b

See more details on using hashes here.

File details

Details for the file hydrusvideodeduplicator-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for hydrusvideodeduplicator-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1b7fe84ccd78ccd85d792c20c970a40acd06ca092961ea45eaa2bd488dc09d6c
MD5 2f8e084e8f07d228067ef07410a93b77
BLAKE2b-256 96f2ed78ec640df013f9e5e1c4918b6bad8ac24623f72e0226e74dbf7bb1df5d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page