Skip to main content

hamming-check: File integrity checker

Project description

hamming-check

A command line tool and python library to encode and decode data using a generic (in byte size) hamming code algorithm.

Hamming Code

Hamming code is a set of error-correction codes that can be used to detect and correct the errors that can occur when the data is moved or stored from the sender to the receiver. It is technique developed by R.W. Hamming for error correction.

You can find more about it on his Wikipedia Article, MSU notes and in the awesome videos by 3Blue1Brown: Hamming pt1 and Hamming pt2.

Installing

Locally

Clone the repo.

git clone git@github.com:Tomcat-42/hamming_check.git

Run setup.py

sudo python setup.py install

Using pip

hamming_check is available on pypi.

sudo pip install hamming_check

Command Line Interface

Description

hamming_check is a cli tool that is intended to help creating secure copies of a file in a hamming encoded output file, and fixing that secure file for single bit corruptions. Also it can check for double bit corruptions, but could not fix that type of error.

Usage

usage: hamming_check [-h] (-e | -d) [-v] [-b BUFFER_SIZE]
                     [input_file] [output_file]

positional arguments:
  input_file            file used for reading data. If not specified,
                        data is read from stdin.
  output_file           file used for writing data. If not specified,
                        data is written to stdout.

options:
  -h, --help            show this help message and exit
  -e, --encode          encode a file into a hamming-encoded file
  -d, --decode          decode a hamming-encoded file into a file
  -v, --verbose         increase output verbosity (can be used
                        multiple times)
  -b BUFFER_SIZE, --buffer-size BUFFER_SIZE
                        change the buffer size (in bytes) used for
                        encoding/decoding
  • input_file: original file that will be secure copied or a secure file that will be recovered. If not provided, data will be read from STDIN.
  • output_file: secure file that will be created from a file or a file that will be recovered from a secure file. If not provided, data will be written to STDOUT.
  • -e|--encode: Sets the encoding operation. File -> Secure File.
  • -d|--decode: Sets the decoding operation. Secure File -> File with error checking/correction.
  • -b|--buffer-size: Sets the number of bytes that will be used for the hamming code, default is 1. Higher Values tends to speed up encoding.
  • -v: Sets the verbosity. If not provided, will be in quiet mode, if -v, only errors will be printed, -vv will print the result of the encoding/decoding operations and -vvv will print all of the hamming algorithm steps.
  • -h: prints the help text.

Examples

  • Encode the file cat.jpg into the secure file cat.jpg.wham

hamming_check -e cat.jpg cat.jpg.wham

  • Decode the secure file cat.jpg.wham into the file cat.jpg.wham

hamming_check -d cat.jpg.wham cat.jpg

  • Encode the file cat.jpg into the secure file cat.jpg.wham using a 4096 bytes hamming code

hamming_check -e -b 4096 cat.jpg cat.jpg.wham

  • decode the secure file cat.jpg.wham into the file cat.jpg using a 4096 bytes hamming code

hamming_check -d -b 4096 cat.jpg.wham cat.jpg

  • Encode the string "test" into the secure file file.txt.wham

echo -n "test" | hamming_check -e file.txt.wham

  • Encode the string "test" and print the encoded result to STDOUT

echo -n "test" | hamming_check -e

  • Decode the encoded string and print the decoded result to STOUT

echo -n <STR> | hamming_check -d

  • Decode the encoded string and save the result to file.txt

echo -n <STR> | hamming_check -d file.txt

  • Decode the file.txt.wham and print the results to STDOUT

hamming_check -d file.txt.wham

hamming_check library

Description

hamming_check is a library for encoding and decoding binary data using the hamming code.

Usage

Hamming Module

Encode and decodes datas using the hamming code of a given buffer_size in bytes.

from hamming_check import Hamming, DecodeStatus, DecodeResult, VerbosityTypes
...
hamming = Hamming(buffer_size=1, verbose=VerbosityTypes.QUIET)
size_of_encoded_data = hamming.get_number_of_output_bytes()
encoded_data = hamming.encode(b't')
...
decoded_result = hamming.decode(encode)
decoded_data, decoded_status = decoded_result.get_data(), decoded_result.get_status()

io Module

Abstractions over files and bytes. The Bytes class is inherited from the bitarray and the Files class is just a wrapper for the python file interface.

from hamming_check import Hamming, DecodeStatus, DecodeResult, VerbosityTypes, File, Bytes
...

hamming = Hamming(buffer_size=2, verbose=VerbosityTypes.QUIET)
input_file = File(open("input_file.txt", "rb"), bytes_per_read=2)
output_file = File(open("output_file.txt", "wb"))

# read data, encodes it, flips a bit and then write
for data in input_file:
  encoded_data = hamming.encode(data)
  bytes = Bytes(encoded_data)
  bytes[0] ^= 1
  output_file.write(bytes.tobytes())

input_file.close()
output_file.close()

Example

Send a encoded file over the network and check it for corruption.

Client Code

  • client.py: Read a image 4096 bytes per time, encode that chunk of bytes, add a random noise to the encoded data and sends it over the network.
#!/usr/bin/env python
from random import randint, random
import socket
from argparse import ArgumentParser
from math import e

from hamming_check.hamming import Hamming


def main():
    # argparser
    parser = ArgumentParser()
    parser.add_argument("-p", "--port", type=int, default=8080)
    parser.add_argument("-f", "--file", type=str)
    parser.add_argument("-b", "--bytes", type=int, default=4096)
    parser.add_argument("-d", "--double-noise", action="store_true")
    args = parser.parse_args()

    # opens the socket connection and the file
    s = socket.socket()
    s.connect(("localhost", args.port))
    filetosend = open(args.file, "rb")

    # Hamming check
    hamming = Hamming(args.bytes)
    bytes_to_send = hamming.get_number_of_output_bytes()

    # sends the encoded
    while data := filetosend.read(args.bytes):
        encoded_data = bytearray(hamming.encode(data))
        # 30% chance of sending the data with noise
        if random() > 0.3:
            print("Sending data with noise")
            encoded_data[randint(0, bytes_to_send)] ^= 1 << randint(0, 7)
        # if enabled, 50% of chance to add double noise to data
        if args.double_noise and random() > 0.5:
            print("Sending data with double noise")
            encoded_data[randint(0, bytes_to_send)] ^= 1 << randint(0, 7)
        s.send(encoded_data)

    filetosend.close()
    s.send(b"DONE")
    print("Done Sending.")
    s.shutdown(2)
    s.close()
    exit(0)


if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print("\nExiting...")

Server Code

  • server.py: Receives encoded data throught the network, decodes it, tries to recover noisy data and then sava it to a output file
#!/usr/bin/env python
import socket
from argparse import ArgumentParser

from hamming_check.hamming import DecodeResult, DecodeStatus, Hamming
from hamming_check.types.verbosity_types import VerbosityTypes


def main() -> None:
    # ArgumentParser
    parser = ArgumentParser()
    parser.add_argument("-p", "--port", type=int, default=8080)
    parser.add_argument("-f", "--file", type=str)
    parser.add_argument("-b", "--bytes", type=int, default=4096)
    args = parser.parse_args()

    # opens socket
    s = socket.socket()
    s.bind(("localhost", args.port))
    s.listen(1)
    c, a = s.accept()
    filetodown = open(args.file, "wb")

    # Hamming check
    hamming = Hamming(args.bytes, VerbosityTypes.QUIET)
    bytes_to_receive = hamming.get_number_of_output_bytes()

    while True:
        data = c.recv(bytes_to_receive, socket.MSG_WAITALL)

        if data == b"DONE" or len(data) == 0:
            print("Done Receiving.")
            break

        encoded_data = hamming.decode(data)

        # if status is not DecodeStatus.NO_ERROR or
        # DecodeStatus.SINGLE_ERROR_CORRECTED, then we have a problem
        bytes_received, status = encoded_data.get_data(
        ), encoded_data.get_status()

        if status == DecodeStatus.SINGLE_ERROR_CORRECTED:
            print("One error detected, and corrected")
        elif status == DecodeStatus.DOUBLE_ERROR_DETECTED:
            print("Two errors detected, your file is corrupted")

        filetodown.write(bytes_received)
        filetodown.flush()

    filetodown.close()
    c.shutdown(2)
    c.close()
    s.close()


if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print("\nBye!")

Putting all together

Run server code

./examples/send_over_network/server.py -f out.jpg

Run client code

./examples/send_over_network/examples.py -f ./examples/send_over_network/really_cool_cat.jpg

Check out.jpg

Even though was added noise to the data, the server was able to recover the image.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hamming-check-1.0.2.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

hamming_check-1.0.2-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file hamming-check-1.0.2.tar.gz.

File metadata

  • Download URL: hamming-check-1.0.2.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for hamming-check-1.0.2.tar.gz
Algorithm Hash digest
SHA256 1c711cbfcd478d8bd2c175bda09ce107d46105dc40f27c85ca47eae5ce8c1e2a
MD5 953d2bf3116292716e7f8bf6bfadb9a4
BLAKE2b-256 a96494dd7d2d46e2a3bf0c4aeab532670673bade07b6d260dd5a30dff4195102

See more details on using hashes here.

File details

Details for the file hamming_check-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for hamming_check-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 86504aeae08dd48e855aba19e1aa51d3a6bc6ea7a8c4bd94f0754b6577c25742
MD5 b05cc48dd08837ef1055d0f448f7c01a
BLAKE2b-256 5513d3a145a712132dcaaaf0d63749dba7749fa37e47e7fc0780459bcafd9fbd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page