Skip to main content

Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

Project description


The Python package for near duplicate video detection

Build Status Build Status Build Status codecov Total alerts Language grade: Python pypi Downloads GitHub lastest commit PyPI - Python Version


Introduction

Videohash is a Python package for detecting near-duplicate videos (Perceptual Video Hashing). It can take any input video and generate a 64-bit equivalent hash value. Videohash is way more faster than comparing the imagehash values of individual frames of the video and more reliable than hashing keyframes.

The video-hash-values for identical or near-duplicate videos are the same or similar, implying that if the video is resized (upscaled/downscaled), transcoded, watermark added/removed, stabilized, color changed, frame rate changed, changed aspect ratio, cropped, black-bars added or removed, the hash-value should remain unchanged or not vary substantially.

How the hash values are calculated

  • Every one second, a frame from the input video is extracted, the frames are shrunk to a 144x144 pixel square, a collage is constructed that contains all of the resized frames(square-shaped), the collage's wavelet hash's bit-list is the first bit-list that we use. The frames extracted are now stitched horizontally to each other, and finally divided into 64 equal sized images, the domiant color of these 64 images are detected and compared with a pre-defined pattern of dominant colors, if they match the bit is set else unset. So now we have two bitlist, finally we bitwise XOR these two bitlists. The XOR'ed output is used to generate the final 64 bit hash-value for the video. The bits are joined to form the 64 bit hash-value of the input value.

When not to use Videohash

  • Videohash cannot be used to verify whether one video is a part of another (video fingerprinting). If the video is reversed or rotated by a substantial angle (greater than 10 degrees), Videohash will not provide the same or similar hash result, but you can always reverse the video manually and generate the hash value for reversed video.

How to compare the video hash values stored in a database


Installation

To use this software, you must have FFmpeg installed. Please read how to install FFmpeg if you don't already know how.

Install videohash

Upgrade pip

python3 -m pip install --upgrade pip

If you do not want to upgrade pip and the installation fails try appending --prefer-binary to the following installation command(s).

Install from the PyPi (recommended):

pip install videohash

Using conda, from conda-forge (recommended):

Maintainer is @step21

conda install -c conda-forge videohash

Install directly from the GitHub repository (NOT recommended):

pip install git+https://github.com/akamhy/videohash.git

Features

  • Generate videohash of a video directly from its URL(uses yt-dlp) or its path.
  • Can be used as the core of a scalable Near Duplicate Video Retrieval (NDVR) system.
  • The end-user can access the image representation(the collage) of the video.
  • A videohash instance can be compared to a 64-bit stored hash, its hex representation, bitlist, and other videohash instances.

Usage

In the following usage example the first two and the fourth instance of VideoHash class are computing the hash for the same video(not same as in checksum) and the third one is a different video.

>>> from videohash import VideoHash
>>> url1 = "https://user-images.githubusercontent.com/64683866/168872267-7c6682f8-7294-4d9a-8a68-8c6f44c06df6.mp4"
>>> videohash1 = VideoHash(url=url1)
>>> 
>>> url2 = "https://user-images.githubusercontent.com/64683866/168869109-1f77c839-6912-4e24-8738-42cb15f3ab47.mp4"
>>> videohash2 = VideoHash(url=url2)
>>> videohash2 - videohash1
2
>>> videohash2.is_similar(videohash1)
True
>>> 
>>> url3 = "https://user-images.githubusercontent.com/64683866/148960165-a210f2d2-6c41-4349-bd8d-a4cb673bc0af.mp4"
>>> videohash3 = VideoHash(url=url3)
>>> videohash3.is_similar(videohash1)
False
>>> videohash3.is_diffrent(videohash2)
True
>>> videohash3-videohash1
34
>>> videohash3-videohash2
34
>>> path4 = "/home/akamhy/Downloads/168872267-7c6682f8-7294-4d9a-8a68-8c6f44c06df6.mp4"
>>> videohash4 = VideoHash(path=path4)
>>> videohash4 == videohash1
True
>>> videohash4 - videohash1
0
>>> videohash4.is_similar(videohash2)
True
>>> videohash4.is_similar(videohash4)
True
>>> videohash4.is_similar(videohash3)
False
>>> 

Extended Usage : https://github.com/akamhy/videohash/wiki/Extended-Usage

API Reference : https://github.com/akamhy/videohash/wiki/API-Reference


Credits


License

License: MIT

Copyright (c) 2021-2022 Akash Mahanty. See license for details.

The VideoHash logo was created by iconolocode. See license for details.

Videos are from NASA and are in the public domain.

NASA copyright policy states that "NASA material is not protected by copyright unless noted".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

videohash-3.0.1.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

videohash-3.0.1-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file videohash-3.0.1.tar.gz.

File metadata

  • Download URL: videohash-3.0.1.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for videohash-3.0.1.tar.gz
Algorithm Hash digest
SHA256 64ba3876804c584a4ae22c70d4708eea08e559c2ea9ce8a7926a2894b4f38c2f
MD5 c1b7d165d5e0652d80c5468032b88354
BLAKE2b-256 e897aa964ed2a1a626201a4bff2a3cfa2d665c535894df3fda34c095d399297f

See more details on using hashes here.

File details

Details for the file videohash-3.0.1-py3-none-any.whl.

File metadata

  • Download URL: videohash-3.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for videohash-3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a230d9cdef4d5b677c7377ddf477662b03feefd52ff675457e0aef3e19ba4d6
MD5 01a87321b0bd5b7fb4213c1086a7cb94
BLAKE2b-256 c1e53fa06f6fc3c7b31cccaa2222c12d459332d2e419cebf238c31bd07cb3e60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page