Skip to main content

Library and CLI for computing O(K)Hash checksums.

Project description

O(K)hash

Introduction

O(K)hash (pronounced OK hash or oꭓash) is a hash function with $O(1)$ complexity based on SHA-256. It is designed to efficiently calculate hashes for large files by reading only a fixed subset of data from random positions within the file. The 'K' parameter denotes the strength of the hashing, with lower values reading less data. For example, a hash with K=1 reads only 1KiB of data, while a hash with K=2 reads 1MiB of data. The hash results are downgradable, meaning a hash of strength K=3 can validate a file hash calculated with K=1.

Rational

Calculating the hash of a large file normally involves reading the entire file, which can be time-consuming, especially with slow I/O operations. O(K)hash addresses this issue by reading a fixed number of bytes from random positions inside the file, providing a constant time complexity of $O(1)$. This approach generates a reliable fingerprint for identifying duplicate files and validating files efficiently.

Limitations

After all, it's just an OK hash:

  • Not Suitable for Detecting Corruptions: O(K)hash is not suitable for detecting file corruptions that do not change the size of the file. In the case of a bit flip or small corruptions, the probability of detecting corruption is lower than: $\frac{base size}{file size}$.
  • Consider File Size Checking: Depending on the nature and number of large files you are working with, it may be more effective to check the file size before calculating a conventional hash to ensure data integrity.

Quickstart

Installation

You can install O(K)hash using pip:

pip install okhash

Usage

  • Calculate O(K)Hash of a String:

    import okhash
    
    data = "Hello, world!"
    checksum = okhash.okhash(data.encode('utf-8'), K=3)
    print(checksum.hex())
    
  • Calculate & compare O(K)Hash of Files:

    import okhash
    
    file1_checksum = okhash.okhash_filepath("file1.bin")
    file2_checksum = okhash.okhash_filepath("file2.bin")
    
    if okhash.compare_okhashes(file1_checksum, file2_checksum):
        print("Checksums match.")
    else:
        print("Checksums do not match.")
    

Command Line Usage

  • Calculate Checksums: To calculate checksums for a file with a specified K value (default is K=2), use the following command:

    python3 -m okhash -K 3 file.bin
    
  • Check Checksums: The result of the previous command can be used to check checksums for multiple files:

    python3 -m okhash *.bin > okhashes.txt
    python3 -m okhash --check okhashes.txt
    
  • Additional Options:

    python3 -m okhash --help
    

The Strengths (K)

Here's a table describing the strengths (K) and their corresponding parameters:

K Base Size (Subset Data Size for Hash Calculation) Block Size
1 1024 B = 1 KiB 1024 B
2 1048576 B = 1 MiB 4096 B
3 1073741824 B = 1 GiB 262144 B
4 1099511627776 B = 1 TiB 16777216 B
K $2^{10K}$ $1024 \times \lceil \frac{2^{6K}}{1024} \rceil$

The minimum file size for a given K is equal to twice the base size; otherwise, the hash calculation will resort to SHA-256 for the entire file.

License

O(K)hash is released under the MIT License.

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okhash-1.0.tar.gz (7.0 kB view details)

Uploaded Source

File details

Details for the file okhash-1.0.tar.gz.

File metadata

  • Download URL: okhash-1.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for okhash-1.0.tar.gz
Algorithm Hash digest
SHA256 8621fe0c31daf50c8acd4490a2c4a6d61d6d375d232c7856b44884892d6ccfa2
MD5 dd6a487816a457e9f558c8214d82eda7
BLAKE2b-256 d7ca9d07b37cb5a7d4fb22c564ff4684ca95b2084c52d14fc89a71dbd1692a16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page