Skip to main content

Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

Project description


The Python package for near duplicate video detection

Build Status Build Status Build Status codecov Total alerts Language grade: Python pypi Downloads GitHub lastest commit PyPI - Python Version


⭐️ Introduction

Videohash is a Python library for detecting near-duplicate videos (Perceptual Video Hashing). Any video input can be used to generate a 64-bit equivalent hash value with this package.

The video-hash-values for identical/near-duplicate videos are the same or similar, implying that if the video is resized (upscaled/downscaled), transcoded, watermark added/removed, changed color, changed frame rate, changed aspect ratio, slightly cropped, or black-bars added/removed, the hash-value should remain unchanged or not vary substantially.

How the hash values are calculated

  • Every one second, a frame from the input video is extracted, the frames are shrunk to a 144x144 pixel square, a collage is constructed that contains all of the resized frames(square-shaped), the collage's wavelet hash is the video hash value for the original input video.

When not to use Videohash

  • Videohash cannot be used to verify whether one video is a part of another (video fingerprinting). If the video is reversed or rotated by a substantial angle (greater than 10 degrees), Videohash will not provide the same or similar hash result, but you can always reverse the video manually and generate the hash value for reversed video.

How to compare the video hash values stored in a database


🏗 Installation

To use this software, you must have FFmpeg installed. Please read how to install FFmpeg if you don't already know how.

Install videohash

pip install videohash
  • Install directly from GitHub:
pip install git+https://github.com/akamhy/videohash.git

🌱 Features

  • Generate videohash of a video directly from its URL or its path.
  • Can be used to implement scalable Near Duplicate Video Retrieval.
  • The end-user can access the image representation(the collage) of the video.
  • A videohash instance can be compared to a 64-bit stored hash, its hex representation, bitlist, and other videohash instances.
  • Faster than the old method of comparing each frame individually. The videohash package generates a single 64-bit video hash value, which saves a significant amount of database space. And the number of comparisons required drops significantly.

🚀 Usage

In the following usage example the first three instance of VideoHash class are computing the hash for the same video(not same as in checksum) and the last one is a different video.

>>> from videohash import VideoHash
>>> # video: Artemis I Hot Fire Test
>>> url1 = "https://www.youtube.com/watch?v=PapBjpzRhnA"
>>> videohash1 = VideoHash(url=url1, download_worst=False)
>>>
>>> videohash1.hash # video hash value of the file, value is same as str(videohash1)
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>>
>>> #VIDEO:Artemis I Hot Fire Test
>>> url2="https://raw.githubusercontent.com/akamhy/videohash/main/assets/rocket.mkv"
>>> videohash2 = VideoHash(url=url2)
>>> videohash2.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> videohash2.hash_hex
'0x341fefff8f780000'
>>> videohash2.hash_hex
'0x341fefff8f780000'
>>> videohash1 - videohash2
0
>>> videohash1 == videohash2
True
>>> videohash1 == "0b0011010000011111111011111111111110001111011110000000000000000000"
True
>>> videohash1 != videohash2
False
>>> path3 = "/home/akamhy/Downloads/rocket.mkv" #VIDEO: Artemis I Hot Fire Test
>>> videohash3 = VideoHash(path=path3)
>>> videohash3.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> videohash3 - videohash2
0
>>> videohash3 == videohash1
True
>>> url4 = "https://www.youtube.com/watch?v=_T8cn2J13-4" #VIDEO: How We Are Going to the Moon
>>> videohash4 = VideoHash(url=url4)
>>> videohash4.hash_hex
'0x7cffff000000eff0'
>>> videohash4 - "0x7cffff000000eff0"
0
>>> videohash4.hash
'0b0111110011111111111111110000000000000000000000001110111111110000'
>>> videohash4 - videohash2
34
>>> videohash4 != videohash2
True

Run the above code @ https://replit.com/@akamhy/videohash-usage-2xx-example-code-for-video-hashing#main.py

Extended Usage : https://github.com/akamhy/videohash/wiki/Extended-Usage

API Reference : https://github.com/akamhy/videohash/wiki/API-Reference


🛡 License

License: MIT

Released under the MIT License. See license for details.

The VideoHash logo was created by iconolocode. See license for details.

Videos are from NASA and are in the public domain.

NASA copyright policy states that "NASA material is not protected by copyright unless noted".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

videohash-2.1.2.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

videohash-2.1.2-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file videohash-2.1.2.tar.gz.

File metadata

  • Download URL: videohash-2.1.2.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for videohash-2.1.2.tar.gz
Algorithm Hash digest
SHA256 f00d67718eecce3d9d39ef01f4520ec9d0702ab6e09e61a633955a70d2790072
MD5 a13e3d93e659833c14d65ef5fd221863
BLAKE2b-256 732321c0f917e684f8759f57e6b23cda21d2c19ea2cd1b66dc4ec2478d27759a

See more details on using hashes here.

File details

Details for the file videohash-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: videohash-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for videohash-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9b9d8222c3c640f44cbde88086f3f275169afb8117e2bb90c1ca9bad9de1cd0d
MD5 6ec448261dfe131f9ca2968ad3af0995
BLAKE2b-256 788dfbaa8ae2ff55921973451a0f1060da25e869af5a9e803bc583a91771f52a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page