Skip to main content

Python package for Perceptual Video Hashing (Near Duplicate Video Detection) - Get a 64-bit comparable hash-value for any video.

Project description

VideoHash

Python package for Perceptual Video Hashing

Build Status Build Status Build Status codecov pypi Downloads GitHub lastest commit PyPI - Python Version


Introduction

Videohash is a Python package for Perceptual Video Hashing (Near-Duplicate-Video-Detection). The package can be used to generate a 64-bit comparable hash-value for any video input. The hash-values are the same or similar for identical/near-duplicate videos, which implies that hash-value should remain unchanged or not change drastically for the video if it's resized (upscaled/downscaled), transcoded, slightly-cropped, or black-bars added/removed.

How the hash values are calculated?

  • Every one second a frame of the input video is extracted, the frames are resized to a 144x144 pixel square, a collage is created that embeds all the resized frames(square-shaped) in it, the wavelet hash value of the collage is computed, and it is the video hash value for the original input video.

When not to use Videohash?

  • Videohash can not be used for verifying if one video is part of another video(video fingerprinting). Videohash doesn't produce the same or similar hash value if the video is reversed or rotated by a significant angle(more than 10 degrees), but you can always reverse the video yourself and generate the hash value for reversed video.

Installation

You must have FFmpeg installed to use this software. If you don't know how to install FFmpeg, please read how to install FFmpeg.

Install videohash

pip install videohash
  • Install directly from GitHub:
pip install git+https://github.com/akamhy/videohash.git

Features

  • Generate videohash of a video directly from its URL or its path.
  • Can be used to implement scalable Near Duplicate Video Retrieval.
  • Image representation of the video is accessible by the end-user.
  • An instance of videohash can be compared with a stored hash(64-bit), its hex representation, and other instances of videohash.
  • Faster than the primitive process of comparing all the frames one by one. The videohash package produces a single 64-bit hash, a lot of database space is saved. And the number of comparisons required drops significantly.

Usage

>>> from videohash import VideoHash
>>> hash1 = VideoHash(url="https://www.youtube.com/watch?v=PapBjpzRhnA", download_worst=False)
>>> str(hash1)
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> hash1.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> hash1.hash_hex
'0x341fefff8f780000'
>>> repr(hash1)
'VideoHash(hash=0b0011010000011111111011111111111110001111011110000000000000000000, hash_hex=0x341fefff8f780000, collage_path=/tmp/tmpe07d_b1g/temp_storage_dir/acn6zsdcb40q/collage/collage.jpg, bits_in_hash=64)'
>>> hash1.collage_path
'/tmp/tmpe07d_b1g/temp_storage_dir/acn6zsdcb40q/collage/collage.jpg'
>>> hash1.bits_in_hash
64
>>> len(hash1)
66
>>> hash2 = VideoHash(url="https://raw.githubusercontent.com/akamhy/videohash/main/assets/rocket.mkv")
>>> hash2.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> hash2.hash_hex
'0x341fefff8f780000'
>>> hash1.hash_hex
'0x741fcfff8f780000'
>>> hash1 - hash2
0
>>> hash2 - "0x341fefff8f780000"
0
>>> hash1 - "0b0011010000011111111011111111111110001111011110000000000000000000"
2
>>> hash1 == hash2
True
>>> hash1 != hash2
False
>>> hash3 = VideoHash(path="/home/akamhy/Downloads/rocket.mkv")
>>> hash3.hash_hex
'0x341fefff8f780000'
>>> hash3.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> hash3 - hash2
0
>>> hash3 == hash1
False
>>> hash3 == hash2
True
>>> hash4 = VideoHash(url="https://www.youtube.com/watch?v=_T8cn2J13-4")
>>> hash4.hash_hex
'0x7cffff000000eff0'
>>> hash4 - "0x7cffff000000eff0"
0
>>> hash4.hash
'0b0111110011111111111111110000000000000000000000001110111111110000'
>>> hash4 - "0b0111110011111111111111110000000000000000000000001110111111110000"
0
>>> hash4 == hash3
False
>>> hash4 - hash2
34
>>> hash4 != hash2
True
>>> hash4 - "0b0011010000011111111011111111111110001111011110000000000000000000"
34
>>>

Run the above code @ https://replit.com/@akamhy/videohash-usage-2xx-example-code-for-video-hashing#main.py

Wiki/Extended Usage/Docs : https://github.com/akamhy/videohash/wiki


License

License: MIT

Released under the MIT License. See license for details.

Videos are from NASA and are in the public domain.

NASA videos are in the public domain. NASA copyright policy states that "NASA material is not protected by copyright unless noted".


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

videohash-2.0.2.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

videohash-2.0.2-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file videohash-2.0.2.tar.gz.

File metadata

  • Download URL: videohash-2.0.2.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for videohash-2.0.2.tar.gz
Algorithm Hash digest
SHA256 96ff27326e77b319248648acda6cfaaa09474425b19fc95824a6f140266af25c
MD5 b0b2737e1bb63cc6c8a578e57486ac34
BLAKE2b-256 29b76a7e4fb42195ecdcde3cef85ebbfe1ae900625a57f660224efda39c32a72

See more details on using hashes here.

File details

Details for the file videohash-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: videohash-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for videohash-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bb77e0dccd3b8f916665acdc628d9f4ab8266617ba69b907b65834a2edf04169
MD5 8440b17b1fbe95aa708e680b52b16078
BLAKE2b-256 764b6ee736b8264edd5537ba74c74f5d17657862ebd0d693d385f346e1002800

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page