Skip to main content

Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

Project description

VideoHash

The Python package for near duplicate video detection

Build Status Build Status Build Status codecov Total alerts Language grade: Python pypi Downloads GitHub lastest commit PyPI - Python Version


⭐️ Introduction

Videohash is a Python library for detecting near-duplicate videos (Perceptual Video Hashing). Any video input can be used to build a 64-bit equivalent hash value with this package. The hash-values for identical/near-duplicate videos are the same or similar, implying that if the video is enlarged (upscaled/downscaled), transcoded, slightly cropped, or black-bars added/removed, the hash-value should remain unchanged or not vary substantially.

How the hash values are calculated.

  • Every one second, a frame from the input video is extracted, the frames are shrunk to a 144x144 pixel square, a collage is constructed that contains all of the resized frames(square-shaped), the collage's wavelet hash is the video hash value for the original input video.

When not to use Videohash.

  • Videohash cannot be used to verify whether one video is a part of another (video fingerprinting). If the video is reversed or rotated by a substantial angle (greater than 10 degrees), Videohash will not provide the same or similar hash result, but you can always reverse the video manually and generate the hash value for reversed video.

How to compare the video hash values stored in a database.


🏗 Installation

To use this software, you must have FFmpeg installed. Please read
how to install FFmpeg if you don't already know how.

Install videohash

$ pip install videohash
  • Install directly from GitHub:
$ pip install git+https://github.com/akamhy/videohash.git

🌱 Features

  • Generate videohash of a video directly from its URL or its path.
  • Can be used to implement scalable Near Duplicate Video Retrieval.
  • The end-user can access the image representation(the collage) of the video.
  • A videohash instance can be compared to a 64-bit stored hash, its hex representation, bitlist, and other videohash instances.
  • Faster than the old method of comparing each frame individually. The videohash package generates a single 64-bit video hash value, which saves a significant amount of database space. And the number of comparisons required drops dramatically.

🚀 Usage

>>> from videohash import VideoHash
>>> hash1 = VideoHash(url="https://www.youtube.com/watch?v=PapBjpzRhnA", download_worst=False) # video : Artemis I Hot Fire Test
>>> str(hash1) # str representation of VideoHash object (the output is video's videohash value)
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> hash1.hash # video hash value of the file, value is same as str(hash1)
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> hash1.bitlist # If you are XOR'ing bits to get the hamming distance store the bitlist in your database and not the hash itself.
[0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> len(hash1.bitlist) # 64 bits of the hashvalue
64
>>> hash1 == hash1.bitlist  # the video hash support the equivalent operator on lists, but only pass bitlists.
True
>>> hash1.hash_hex # hexadecimal representation of the videohash value
'0x341fefff8f780000'
>>> repr(hash1) # representation of VideoHash object
'VideoHash(hash=0b0011010000011111111011111111111110001111011110000000000000000000, hash_hex=0x341fefff8f780000, collage_path=/tmp/tmpvfr41629/temp_storage_dir/79c95zh4bq0s/collage/collage.jpg, bits_in_hash=64)'
>>> hash1.video_path # path of the downloaded video
'/tmp/tmpvfr41629/temp_storage_dir/79c95zh4bq0s/video/video.webm'
>>> hash1.storage_path # the storage directory
'/tmp/tmpvfr41629/temp_storage_dir/79c95zh4bq0s/'
>>> hash1.collage_path # path of the generated collage, the wavelet hash of this special collage is videohash value of the input video
'/tmp/tmpvfr41629/temp_storage_dir/79c95zh4bq0s/collage/collage.jpg'
>>> hash1.delete_storage_path() # To delete the storage path, deleting it will also delete the collage, extracted frames, and the downloaded video.
>>> hash1.bits_in_hash # how many bits in the hash, always 64 (a constant)
64
>>> len(hash1) # length of the hash value string, 64(no of bits in hash) + 2(prefix '0b')
66
>>> hash2 = VideoHash(url="https://raw.githubusercontent.com/akamhy/videohash/main/assets/rocket.mkv") # video : Artemis I Hot Fire Test, yes same as hash1(downscaled)
>>> hash2.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> hash2.hash_hex
'0x341fefff8f780000'
>>> hash1.hash_hex
'0x341fefff8f780000'
>>> hash1 - hash2 # videohash objects support application of '-' operator on them. The other value must be a string (prefixed with '0x' or '0b') or another VideoHash object
0
>>> hash2 - "0x341fefff8f780000"
0
>>> hash1 - "0b0011010000011111111011111111111110001111011110000000000000000000"
0
>>> hash1 - "0b1111111111111111111111111111111111111111111111111111111111111111"
32
>>> hash1 == hash2 # videohash objects support application of '==' operator on them. The other value must be a string (prefixed with '0x' or '0b') or another VideoHash object.
True
>>> hash1 == "0b0011010000011111111011111111111110001111011110000000000000000000"
True
>>> hash1 != hash2 # videohash objects support application of '!=' operator on them. The other value must be a string (prefixed with '0x' or '0b') or another VideoHash object.
False
>>> hash3 = VideoHash(path="/home/akamhy/Downloads/rocket.mkv") # video : Artemis I Hot Fire Test, yes same as hash2 (downloaded locally)
>>> hash3.hash_hex
'0x341fefff8f780000'
>>> hash3.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> hash3 - hash2
0
>>> hash3 == hash1
False
>>> hash3 == hash2
True
>>> hash4 = VideoHash(url="https://www.youtube.com/watch?v=_T8cn2J13-4") #  video : How We Are Going to the Moon - 4K, a completely different video from the first 3 videos
>>> hash4.hash_hex
'0x7cffff000000eff0'
>>> hash4 - "0x7cffff000000eff0"
0
>>> hash4.hash
'0b0111110011111111111111110000000000000000000000001110111111110000'
>>> hash4 - "0b0111110011111111111111110000000000000000000000001110111111110000"
0
>>> hash4 == hash3
False
>>> hash4 - hash2
34
>>> hash4 != hash2
True
>>> hash4 - "0b0011010000011111111011111111111110001111011110000000000000000000"
34
>>>

Run the above code @ https://replit.com/@akamhy/videohash-usage-2xx-example-code-for-video-hashing#main.py

Wiki/Extended Usage/Docs : https://github.com/akamhy/videohash/wiki


🛡 License

License: MIT

Released under the MIT License. See license for details.

Videos are from NASA and are in the public domain.

NASA videos are in the public domain. NASA copyright policy states that "NASA material is not protected by copyright unless noted".


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

videohash-2.0.3.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

videohash-2.0.3-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file videohash-2.0.3.tar.gz.

File metadata

  • Download URL: videohash-2.0.3.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for videohash-2.0.3.tar.gz
Algorithm Hash digest
SHA256 4079e402dc1030d4490d239e27cb373b9f44d35220689b009ae4e6dd462c0209
MD5 495fa5ecf3bdda1be958167f28d773d4
BLAKE2b-256 155454af559633b7e8e727ff7cdc1b0f5a56a515d2cd3e712663e7e9133af6d9

See more details on using hashes here.

File details

Details for the file videohash-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: videohash-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for videohash-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c0ec77c5b6c16a429ecd2af1ab0d068bdc211507b6fa87eb98d80a70b0a48be4
MD5 fc724308ae78b1ce499d55eb07ce3e02
BLAKE2b-256 8f4b6852b2235994dc8edd3720442709c18c37eba5f167a91e27ddbcc59f2f5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page