A package to generate visual hashes.
A visual hash is an image that is generated from a large string, just as an ordinary cryptographic hash is usually represented as a hexadecimal string. The advantage of a visual hash is that it is easier for humans to remember and compare.
What makes a hash good enough?
A good visual hash should have the following properties:
A high information content, as measured by its Shannon entropy. This results in a hash that has a low chance of collision. This provides pre-image resistance, i.e. it makes it difficult to create a second input that results in a hash identical to a given hash.
A high minimum self-information. i.e. the lowest self-information value should be as high as possible. This property implies the former property, but is itself distinct. The lowest self-information output is most prone to collisions. Testing for this property is challenging, more challenging than the mean information content discussed previously.
A second-best for this property would be to be able to identify hashes that have low self-information. Of course, if we can reliably identify them, we could (in principle) eliminate them entirely by checking for a low self-information during the hashing process, and trying again when this is encountered.
Second pre-image resistance, which means that knowing the hash input, we should not be able to produce a similar (or identical) hash. We achieve this by using a cryptographic hash as input to our visual hashes, which ensures the second pre-image resistance is identical to the first pre-image resistance.
In order to generate testable visual hashes, we actually aim for raw algorithms that lack second pre-image resistance (which we then add on via the cryptographic hash). This allows us to more readily explore “similar” images in order to estimate the information content in the visual hash, and thus its first pre-image resistance.