Library containing Huffman algos and bespoke compressor
Project description
Huffman compression and decompression functions and decorators (1.0.2)
Compress functions
import huffpress
from huffpress.press.compress import compress
help(huffpress.press.compress)
Help on module huffpress.press.compress in huffpress.press:
NAME
huffpress.press.compress - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
compress.py
Contains all compression functions using the Huffman encoding algorithm to
encode most frequently occurring terms with short binary sequences,
and encoding least frequently occurring terms with longer binary sequences.
Process:
-------
1) Input String -> 2) ASCII ordinal values -> 3) Huffman Encoding
-> 4) Replace characters with Encodings -> NOTE: Most frequent characters
will have shorter encodings than Least frequent characters. The idea is
there will be numerous frequent characters, which can be replaced by a short
binary encoding - and so when we 5) Pack binaries into buckets of
length 8 -> we combine 2 or more characters into a single 8-bit byte,
and hence 6) Convert the binaries of length 8 to decimal and 7) The decimals
are ASCII ordinal values which can be converted to characters
(ASCII values between 1 and 255).
FUNCTIONS
add_huff_map(final_seq: bytearray, huff_map: huffpress.huff.htypes.HuffCode) -> bytearray
add_huff_map(final_seq: bytearray, huff_map: HuffCode) -> bytearray
Concatenate the final generated Huffman sequence with the Huffman map,
which is required for decoding the Huffman sequence.
:param final_seq: final compressed Huffman sequence binaries computed by
compress_seq_bins function
:param huff_map: Huffman map containing terms and their encoding
:return: concatenated final_seq + huff_map in a bytearray sequence
compress(inp: str, verbose: bool = False, mode: huffpress.auxi.modes.Mode = <Mode.DEFAULT: 0>) -> huffpress.huff.htypes.CompData
compress(inp: str, verbose: bool = False,
mode: Mode = Mode.DEFAULT) -> CompData:
Generic compression function taking in input either filename or
string to compress.
:param inp: filename or string text to compress
:param verbose: set to True for printing console outputs
:param mode:
Mode.DEFAULT --> if file exists, compress file, otherwise
compress string text
Mode.FILE --> compress file
Mode.RAW --> compress string text
:return: if compressed file, return compressed output filename. otherwise,
return bytearray compressed data
compress_bytes(inp_bytes: bytes, verbose: bool = False) -> bytearray
compress_bytes(inp_bytes: bytes, verbose: bool = False) -> bytearray:
Compress input data bytes using the Huffman Encoding algorithm.
Function compress_string takes an input string which transforms to bytes,
then calls this function to compress.
:param inp_bytes: input data bytes to be compressed
:param verbose: set to True for printing console outputs
:return: Final compressed bytearray sequence
compress_file(inp_file: str, verbose: bool = False)
compress_file(inp_file: str, verbose: bool = False):
Compresses the contents of a file and outputs to a file
with extension ".hac"
e.g. some_file.ext --- compressed to --> some_file.ext.hac
:param inp_file: input file to compress
:param verbose: set to True for printing console outputs
:return: name of the compressed output file
compress_seq_bins(final_bins: List[str], verbose: bool = False) -> bytearray
compress_seq_bins(final_bins: List[str],
verbose: bool = False) -> bytearray:
From a given list of binaries constructed from the final Huffman sequence
i.e. create_seq_bins function, compress the binaries (converting) to an
ASCII ordinal value.
:param final_bins: list of binary sequences construct from the final
Huffman sequence
:param verbose: set to True for printing console outputs
:return: bytearray of ascii ordinal values constructed from the Huffman
sequence of binaries, which collapses 2 or
more characters into less number of characters for most frequent
occurring terms in the original raw data
compress_string(inp_st: str, verbose: bool = False) -> bytearray
compress_string(inp_st: str, verbose=False) -> bytearray:
Compresses input string using the Huffman Encoding algorithm
:param inp_st: input string to be compressed
:param verbose: set to True for printing console outputs
:return: compressed data in bytearray format
create_final_sequence(huff_seq_rem: Tuple[int, str], verbose: bool = False) -> str
create_final_sequence(huff_seq_rem: Tuple[int, str],
verbose: bool = False) -> str:
From a given Huffman encoded sequence (computed by create_huff_sequence
function), convert to a binary sequence.
:param huff_seq_rem: tuple of 0:Huffman sequence and 1:remainder length
(to be used for '0' padding)
:param verbose: set to True for printing console outputs
:return: final Huffman sequence converted to a binary sequence
create_huff_sequence(huff: huffpress.huff.htypes.HuffCode, inp_data: huffpress.huff.htypes.InputData, verbose: bool = False) -> Tuple[int, str]
create_huff_sequence(huff: HuffCode, inp_data: InputData,
verbose: bool = False) -> Tuple[int, str]:
Creates an encoded Huffman sequence from a given Huffman tree dictionary
and input data string text.
:param huff: Huffman tree dictionary (encoded sequences per term)
computed by hfunctions.create_huff_tree_encoding
:param inp_data: input data string text to be encoded
:param verbose: set to True for printing console outputs
:return: (number of 0 paddings required, new encoded sequence)
create_seq_bins(final_seq: str, verbose: bool = False) -> List[str]
create_seq_bins(final_seq: str, verbose: bool = False) -> List[str]:
From a given final Huffman sequence (computed by create_final_sequence
function) extract the sequence of binaries of length 8 and store in a list
:param final_seq: Final Huffman sequence string of binaries
:param verbose: set to True for printing console outputs
:return:
DATA
List = typing.List
Optional = typing.Optional
Tuple = typing.Tuple
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\press\compress.py
some_str = "Hello this is some text that will be encoded by the Huffman algorithm."
comp_str = compress(some_str)
comp_str
CompData(data=bytearray(b'\x07p\xbbNU\xb3\xdb?)\xa1\xc8\x98\xf2\xbb>v\xbb\xef\x1cB\x84\x88\x8f}<\xa8\xee\xaaR\xd6\xe1\xf7u\xa6\x0cWX\x80{"72":"1A","101":"8","108":"R","111":"K","32":"F","116":"9","104":"L","105":"M","115":"1L","109":"1M","120":"2U","97":"1N","119":"2V","98":"1B","110":"1C","99":"2W","100":"1D","121":"2X","117":"2Y","102":"1E","103":"2Z","114":"34","46":"35"}11110001'))
Decompress functions
from huffpress.press.decompress import decompress
help(huffpress.press.decompress)
Help on module huffpress.press.decompress in huffpress.press:
NAME
huffpress.press.decompress - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
decompress.py
Contains all decompression functions by first extracting the Huffman map
from the input data and using the map to convert back the compressed
characters to the original characters.
FUNCTIONS
decompress(inp: huffpress.huff.htypes.CompData, outfile: Union[str, NoneType] = None, verbose=False)
decompress(inp: CompData, outfile: Optional[str] = None, verbose=False):
Decompress bytearray data or contents of a file
:param inp: either bytearray compressed data or the filename containing the
data
:param outfile: name of the output file name (optional)
:param verbose: set to True for printing console outputs
:return: either decompressed bytearray data or name of decompressed output
file
decompress_bytes(inp_bytes: bytes, verbose=False) -> bytearray
decompress_bytes(inp_bytes: bytes, verbose=False) -> bytearray:
Main function to decompress input bytes by extracting the Huffman map
and using the map to replace the encoded sequences with the original
characters.
:param inp_bytes: Input data to be compressed
:param verbose: set to True for printing console outputs
:return: decompressed bytearray data
decompress_file(inp_file: str, outfile: Union[str, NoneType] = None, verbose=False)
decompress_file(inp_file: str, outfile: Optional[str] = None,
verbose=False):
Decompress file
:param inp_file: File to be decompressed
:param outfile: Output file for decompressed contents to be saved
:param verbose: set to True for printing console outputs
:return: name and path of the output file
extract_huff_map(inp_bytes: bytes, verbose: bool = False) -> Tuple[huffpress.huff.htypes.HuffCode, int]
extract_huff_map(inp_bytes: bytes,
verbose: bool = False) -> Tuple[HuffCode, int]:
Extract Huffman encoding dictionary map from the input data.
:param inp_bytes: input sequence of bytes containing compressed data
and Huffman map
:param verbose: set to True for printing console outputs
:return: Huffmann map dictionary and the length of the map
reverse_final_sequence(bstr: bytes, verbose: bool = False) -> str
reverse_final_sequence(bstr: bytearray, verbose: bool = False) -> str:
Convert the input (already compressed sequence) of ascii ordinal values to
a binary sequence string, which is the encoded Huffman sequence
:param bstr: input sequence of ascii ordinal values (compressed data
bytearray format)
:param verbose: set to True for printing console outputs
:return: binary string of compressed data (Huffman encoded sequence
of 0's and 1's)
reverse_huff_sequence(huff_map: huffpress.huff.htypes.HuffCode, seq: str, verbose: bool = False) -> bytearray
reverse_huff_sequence(huff: HuffCode, seq: str,
verbose: bool = False) -> bytearray:
Reverse the input binary string Huffman encoded sequence --> back to the
original characters. This is done by traversing through the sequence in
order and identifying any of the Huffman encoded sequence from the
given (huff) Huffman map. Since all encodings are unique at any length,
we can replace in this forward travelling manner.
:param huff_map: Huffman map containing the binary encodings to original
character
:param seq: input binary string of Huffman encoded sequence
:param verbose: set to True for printing console outputs
:return:
DATA
Optional = typing.Optional
Tuple = typing.Tuple
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\press\decompress.py
decomp_byt = decompress(comp_str)
decomp_byt
bytearray(b'Hello this is some text that will be encoded by the Huffman algorithm.')
# Now let's compress a longer string
long_str = "This is the start of a very long text.. It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like). Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to repeat predefined chunks as necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. The generated Lorem Ipsum is therefore always free from repetition, injected humour, or non-characteristic words etc."
comp_long_str = compress(long_str)
comp_long_str
CompData(data=bytearray(b'\x07\x9e\xc8k\x8d|\x91\xf5\x88\x19\xden\x9e\xf9\x0b<]\xad\xf8oy:w\xd1\x9ck\xa7\x8b\xb5\xbc\xd6&\xe8\x8dH\xaf\xcc\x97g\xc9&t\xe0Qd>\xa1\x8c~\xe3\xd65\x80\x97a_\xddg\xc9\x1e\x05\x16\x9b\xa2}\xce\xd8[;\xcd\xd3\xec\x96\x9f\xaaE\xbc]\xf4\x8b[\xd38\xe5x\xa5\x8e\xac\x9d\xe7\xb2?g\x16\xce\xf3uP\xb5\xbf}\xc0\xc3\xe8\xda\x95\x0e5\xf2I\x9cs\xc9+\xa7\x83\x81\x9e8O"j\xbb8A#\xd65\x80;\x95\x82\xed\xdeo\x13\x98B\xb5\xf4\xae\xfbl\xf4W\xf1\xfa\xa8Z\xdf\xbd\xbcN\xd8[<\x88\x1d~\xe7l-\x9eD\x0e\xf7_\x82i\x16\xb7\x8ex\xbb\xe9\xf1\x1d\'\x81E\xa6\xe8\x9fx\xda\xd8\x8dJw\x9f\x91\xac\xeb5\xa7\x1fo\xb2\xb7DjB\xd6\xfd\x92\xed)i\xae\x8c\xbf\xa8\xee\xfb%\xa7\x95\x8e8Wg\xd5\xd5G\xef\xb8\x18}\x1bR\xa1\xd2\xbeH\x88u\x99\x91Q\xcf\x07Y\x8f\xc3{\xcd}\x19}=\x14\x0b\xa5\xe6p\xfb\xd8\xb8\x18q\xda\x95\r\xef\xd41\x8fS\\\xf7\xc8x#Y\xf5\x1d\xde\x87\rz\xc1\x8cx\xb7\xc9\x11\x0e-25\xd6N\xf7\x91\x00]U\xef\x90\xa1v\xaf$\xbey\xbe\xf1\xbeW\xde\xf9\x0f\x92>\xc2\x81Z\xfd<\x0e\x0c\r}\xd6t\xba\xe1Yl\xd7\xe9\xe0p`k\xbb}\x94\x1b=\x1fyE\xa7\xc6\xec+\xf2T\x1dA\xd1\x97\xf2G\xc4t\x9b\xcd;\xef\xb8\x18}\x1bR\xa1\xc6\xbd\x0c6\x8d\x9dj\x84,\xf8oy\xdeo\x92?`\x16\xc1kz2\xfeY\xb1\xa3\x98-o\x16Z\xab\x02\xc9\xdf}\xc0\xc3\xe8\xda\x95\x0f$\xaf\xb8\x96\xf9#\xc5\x96\xaa\xc0\xb3{^\xb1\x19h\x0b\xebT!g\xc3{\xceo\x90\xf4-q\xf9#\xe7\xfb\xcf\xbc\xdeV\xbf\xaaE\xba7Sig\xd4\xdf`\x16\xc2\x1f\x1d\xf4\xf4\xf6\xa4bl\xef7\xcb6=\x19~\xae\x04\x86\xe8\x95\xf1\xcf\x8f\xe0\x9aOO\x96l}ln\x18\x16\xfb\x9d\xf4\xce\xfa3\xc9+\xd2\x82\xf8\xdf+\xec\xf9\xdd\xa3g\x98o\x9fqlP\t\xad\x7fr\xb3\xa4i\xfeH\xf8\x94\xdb\x8bc\xf3\x13v\x03\xb0\xdd\xf2\xcd\x8d\x1c\xc1ku\xf00Aaky\xaa-\x82\x91\x8d\x9dMt\x91\xad+\x9d\xf4g\xd4\x95\xf6}\x95\x14\x01\xa2\xbe-\xf2G\xcf\xf7\xa2\xf4\xef+\xea\x1c\x97\xc9\x1e\x06%(\xf7\x9b\xef\x9c\x04\xa3\x9e\xa4NW\xb9\xdb\x10XZ\xdf\xbe\xe0a\xf4mJ\x87\xd9*\x92\xd3Z\xfa2\xfc\x1c\x0f\x03ql\x8d\x9fP\xe4\xba\xcdi\xc7\xdb\xec\xad\xd1\x1a\x90\xb5\xbe\x9el\xd4\x80|GI\xf7\x84V\xaa\xf7\xa9-3\xf2i!\xc5\xae\x8a\x96-o\xbeB\x85\xda\xbb\xcd\xf7\xdc\x0c>\x8d\xa9P\x9d\xe7\xb2 z\x01\xf0F\xb3\xdf@\x14\xc1v\xae\xf3}\x92\xa9-5\xdeo\xbe\xe0a\xf4mJ\x87K\xe81M\xd1:\xfe\xe5g\xc9\x1f\x04\x9f8\x0eY\xe4\x97\xcf\xa5M1\x02\xbe\x91\xc2\t\x82\xed\xc5\xbd<\x0f\x99\xc2\x1a\xfe\xeb8\xb4\xf8\xdd\x85~J\x83\xa85\xf7\x0e\x08\xcbx\r\x15\xfdN\x0b\xaf\xaaCt\xba\xdd\xb7\xb9\xe2\xef\xa7\x9b\xe5\xbdDm\x96F\xcf\xb8\xc4M\xf4\xdd\x13;\xe8\x9b\xd8\xea\xe8\x07\xda\xe2\xd6\xfc~\xaa=>\xc9T\x96\x9e\xf3}\xf7\x03\x0f\xa3jT5\xfb\x1d]\x89_\xc7\xfb\x8f\xa5\x01\xf9"\x07\x8d6\xf7:5\x99!ky\x86\xe4\x00J\xa1k|\x85k-\xc5\xbeH\xf8\n\xd7\x13\xdeo\x86\xf7\x93\xbd\xe1\x18\xfc\x91\xfb\xee\x06\x1fF\xd4\xa8{K\x10Lp\xae\xed\xf2G\xe8l \xc7>\x16_\xc7\xe0v)\x9f`+3\x0b\x15\xfb\xa4\xa6\xd3]+\xb1\xb8\xd5 Y\xaf\xc14\x8b[\xf2C_$|\xc2\x15\x9f\x01G\xdaX\x82c\x87v\xf9#\xf46\x10c\x93\xbe\x8c\xea\xa3]:\xc6\xec\x17d\x0b;\xcd\xde\xf9\x0fz\xf7\x9b\xcf\xbe\x98-\xf58.\xb5\xfb\x9e\x1b\x85\x8a\xfe\xa1\xc9t\xf2F\\\xca\x8fy\xbc\x1df>\x8bak\x8f\xac\x05]\x8a\x03Z\xfe?\xb4\xb1\x04\xc3\xf7\xdc\x0c>\x8d\xa9P\xfa\xa47K\xc5\xdfMp)N\xc9\xba&w\x9e\xc8\xfbK\x10L+\xfb\xee\x06\x1fF\xd4\xa8q\xaf\x92 fp=#\xa9,\xaf0\x13\xe6\x07\x87\x03\xb1\xc1\xc1v\xd7\xc5\xa7\xc6\xec+\xf2T\x1dA\xaf\xb8vv\x9e]$\x04\xbb\x08\r`\xdd\xf58.\xb9\xcb\xa7\x00{"84":"N1","104":"1E","105":"H","115":"Q","32":"F","116":"S","101":"9","97":"K","114":"G","111":"N","102":"2U","118":"67","121":"30","108":"1D","110":"M","103":"31","120":"OR","46":"5Q","73":"6G","98":"6M","100":"17","99":"32","119":"6I","112":"3A","107":"6H","117":"16","76":"DB","109":"1C","45":"N0","44":"6J","39":"QL","67":"2QQ","69":"2QR","77":"1A6","86":"2QS","40":"2QT","106":"N2","41":"2QU","49":"1A7","53":"2QV","48":"QK","57":"2QW","54":"2QX","65":"1DC","80":"2QY","50":"2QZ"}111100110'))
print(f"Length of original text: {len(long_str)}\nLength of compressed text: {len(comp_long_str.data)}")
Length of original text: 1979
Length of compressed text: 1584
Compress and Decompress file
!dir *.txt
Volume in drive C has no label.
Volume Serial Number is 6600-2488
Directory of C:\Users\datas\PycharmProjects\main\TMP
14/09/2021 18:08 481,072 outfile.txt
08/09/2021 18:33 481,072 text_file.txt
2 File(s) 962,144 bytes
0 Dir(s) 478,878,687,232 bytes free
comp_file = compress("text_file.txt")
comp_file
CompData(data='text_file.txt.hac')
!dir text_file.*
Volume in drive C has no label.
Volume Serial Number is 6600-2488
Directory of C:\Users\datas\PycharmProjects\main\TMP
08/09/2021 18:33 481,072 text_file.txt
14/09/2021 18:24 287,084 text_file.txt.hac
2 File(s) 768,156 bytes
0 Dir(s) 478,878,621,696 bytes free
from huffpress.huff.htypes import InputData
decomp_file = decompress(InputData(data="text_file.txt.hac"), "outfile.txt")
decomp_file
'outfile.txt'
!dir *.txt*
Volume in drive C has no label.
Volume Serial Number is 6600-2488
Directory of C:\Users\datas\PycharmProjects\main\TMP
14/09/2021 18:24 481,072 outfile.txt
08/09/2021 18:33 481,072 text_file.txt
14/09/2021 18:24 287,084 text_file.txt.hac
3 File(s) 1,249,228 bytes
0 Dir(s) 478,878,556,160 bytes free
!fc text_file.txt outfile.txt
Comparing files text_file.txt and OUTFILE.TXT
FC: no differences encountered
Decorators
from huffpress.press.decorators import comp, decomp
help(huffpress.press.decorators)
Help on module huffpress.press.decorators in huffpress.press:
NAME
huffpress.press.decorators - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
decorators.py
Contains decorators for compressing and decompressing variable objects
e.g.
# Function below with @comp decorator returns the compressed
# byterray of "Hello world"
@comp
def some_fn():
return "Hello world"
# Function below decompresses contents of var2 first before proceeding
# with the rest of the function
@decomp("var2")
def some_fn2(var1, var2):
...
FUNCTIONS
comp(fun)
Compression decorator, which compresses final string result
:param fun: Function where string output will be compressed
:return: compressed string
decomp(*bytearray_vars)
Decompression decorator, which first decompresses given bytearray variable
objects before proceeding with the rest of the function
:param bytearray_vars: Bytearray variables to decompress first
:return: function is run as normal with the input bytearray_vars
variables being decompressed first
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\press\decorators.py
Compression decorator
help(comp)
Help on function comp in module huffpress.press.decorators:
comp(fun)
Compression decorator, which compresses final string result
:param fun: Function where string output will be compressed
:return: compressed string
@comp
def multiply_string(inp_string):
# do some processing...
final_string = inp_string
return final_string * 500
in_st = "This will be Huffman encoded many times."
dec_string = multiply_string(in_st)
dec_string[-1000:] # observe last 1000 chars of compressed string
bytearray(b'3\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc4&{U \x15\x7fX\xd1\x1c-\xbf\xaey\xf9\xdc-\x85\x8c\xe7\xf7\x88L\xf6\xaa@*\xfe\xb1\xa28[\x7f\\\xf3\xf3\xb8[\x0b\x19\xcf\xef\x10\x99\xedT\x80U\xfdcDp\xb6\xfe\xb9\xe7\xe7p\xb6\x163\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc4&{U \x15\x7fX\xd1\x1c-\xbf\xaey\xf9\xdc-\x85\x8c\xe7\xf7\x88L\xf6\xaa@*\xfe\xb1\xa28[\x7f\\\xf3\xf3\xb8[\x0b\x19\xcf\xef\x10\x99\xedT\x80U\xfdcDp\xb6\xfe\xb9\xe7\xe7p\xb6\x163\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc4&{U \x15\x7fX\xd1\x1c-\xbf\xaey\xf9\xdc-\x85\x8c\xe7\xf7\x88L\xf6\xaa@*\xfe\xb1\xa28[\x7f\\\xf3\xf3\xb8[\x0b\x19\xcf\xef\x10\x99\xedT\x80U\xfdcDp\xb6\xfe\xb9\xe7\xe7p\xb6\x163\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc4&{U \x15\x7fX\xd1\x1c-\xbf\xaey\xf9\xdc-\x85\x8c\xe7\xf7\x88L\xf6\xaa@*\xfe\xb1\xa28[\x7f\\\xf3\xf3\xb8[\x0b\x19\xcf\xef\x10\x99\xedT\x80U\xfdcDp\xb6\xfe\xb9\xe7\xe7p\xb6\x163\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc0{"84":"14","104":"15","105":"P","115":"1P","32":"D","119":"16","108":"G","98":"17","101":"V","72":"18","117":"19","102":"H","109":"S","97":"I","110":"T","99":"1A","111":"1B","100":"J","121":"1C","116":"1D","46":"1O"}11011000')
print(f"Length of original string: {len(in_st) * 500}\nLength of compressed string: {len(dec_string)}")
Length of original string: 20000
Length of compressed string: 10663
decompress(InputData(data=dec_string))[-2000:] # last 2000 chars of decompressed data
bytearray(b'This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.')
Decompression decorator
from huffpress.press.decorators import decomp
help(decomp)
Help on function decomp in module huffpress.press.decorators:
decomp(*bytearray_vars)
Decompression decorator, which first decompresses given bytearray variable
objects before proceeding with the rest of the function
:param bytearray_vars: Bytearray variables to decompress first
:return: function is run as normal with the input bytearray_vars
variables being decompressed first
@decomp("in_var")
def process_string(in_var: str):
print(in_var[-1000:])
print(dec_string[-1000:])
print("\n\nDecompressing via decomp decorator\n\n")
process_string(in_var=dec_string) # must provide the input variable key name i.e. in_var
bytearray(b'3\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc4&{U \x15\x7fX\xd1\x1c-\xbf\xaey\xf9\xdc-\x85\x8c\xe7\xf7\x88L\xf6\xaa@*\xfe\xb1\xa28[\x7f\\\xf3\xf3\xb8[\x0b\x19\xcf\xef\x10\x99\xedT\x80U\xfdcDp\xb6\xfe\xb9\xe7\xe7p\xb6\x163\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc4&{U \x15\x7fX\xd1\x1c-\xbf\xaey\xf9\xdc-\x85\x8c\xe7\xf7\x88L\xf6\xaa@*\xfe\xb1\xa28[\x7f\\\xf3\xf3\xb8[\x0b\x19\xcf\xef\x10\x99\xedT\x80U\xfdcDp\xb6\xfe\xb9\xe7\xe7p\xb6\x163\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc4&{U \x15\x7fX\xd1\x1c-\xbf\xaey\xf9\xdc-\x85\x8c\xe7\xf7\x88L\xf6\xaa@*\xfe\xb1\xa28[\x7f\\\xf3\xf3\xb8[\x0b\x19\xcf\xef\x10\x99\xedT\x80U\xfdcDp\xb6\xfe\xb9\xe7\xe7p\xb6\x163\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc4&{U \x15\x7fX\xd1\x1c-\xbf\xaey\xf9\xdc-\x85\x8c\xe7\xf7\x88L\xf6\xaa@*\xfe\xb1\xa28[\x7f\\\xf3\xf3\xb8[\x0b\x19\xcf\xef\x10\x99\xedT\x80U\xfdcDp\xb6\xfe\xb9\xe7\xe7p\xb6\x163\x9f\xde!3\xda\xa9\x00\xab\xfa\xc6\x88\xe1m\xfds\xcf\xce\xe1l,g?\xbcBg\xb5R\x01W\xf5\x8d\x11\xc2\xdb\xfa\xe7\x9f\x9d\xc2\xd8X\xce\x7fx\x84\xcfj\xa4\x02\xaf\xeb\x1a#\x85\xb7\xf5\xcf?;\x85\xb0\xb1\x9c\xfe\xf1\t\x9e\xd5H\x05_\xd64G\x0bo\xeb\x9e~w\x0bac9\xfd\xe2\x13=\xaa\x90\n\xbf\xach\x8e\x16\xdf\xd7<\xfc\xee\x16\xc2\xc6s\xfb\xc0{"84":"14","104":"15","105":"P","115":"1P","32":"D","119":"16","108":"G","98":"17","101":"V","72":"18","117":"19","102":"H","109":"S","97":"I","110":"T","99":"1A","111":"1B","100":"J","121":"1C","116":"1D","46":"1O"}11011000')
Decompressing via decomp decorator
This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.This will be Huffman encoded many times.
Data structure types
help(huffpress.huff.htypes)
Help on module huffpress.huff.htypes in huffpress.huff:
NAME
huffpress.huff.htypes - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
htypes.py
Contains all Huffman Data Structure Types
CLASSES
builtins.object
CompData
HuffCode
HuffSeq
HuffTerm
HuffTuple
InputData
Leaves
SortedTree
TermFreq
class CompData(builtins.object)
| CompData(data: Union[str, bytearray]) -> None
|
| data = Union[str, bytearray]
|
| Data to be compressed will either be the filename (str) or compressed data
| (bytearray)
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, data: Union[str, bytearray]) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'data': typing.Union[str, bytearray]}
|
| __dataclass_fields__ = {'data': Field(name='data',type=typing.Union[st...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
class HuffCode(builtins.object)
| HuffCode(data: Dict[int, str]) -> None
|
| HuffCode = Dict[int, str]
|
| Final encoded Huffman encoded sequences with key as the ordinal ASCII value
| and the value as the binary sequence string
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, data: Dict[int, str]) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'data': typing.Dict[int, str]}
|
| __dataclass_fields__ = {'data': Field(name='data',type=typing.Dict[int...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
class HuffSeq(builtins.object)
| HuffSeq(seq_term: str, huff_term: huffpress.huff.htypes.HuffTerm) -> None
|
| seq_term = str
| huff_term = HuffTerm
|
| Huffman sequence made up of the sequence of terms string and the HuffTerm
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, seq_term: str, huff_term: huffpress.huff.htypes.HuffTerm) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'huff_term': <class 'huffpress.huff.htypes.HuffTerm...
|
| __dataclass_fields__ = {'huff_term': Field(name='huff_term',type=<clas...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
class HuffTerm(builtins.object)
| HuffTerm(freq: int, node: Union[huffpress.huff.HuffNode.HuffNode, NoneType]) -> None
|
| freq = int
| node = Optional[HuffNode]]
|
| For a single Huffman Node we have a total number of frequency
| occurrences, and we have the node (which can be null)
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, freq: int, node: Union[huffpress.huff.HuffNode.HuffNode, NoneType]) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'freq': <class 'int'>, 'node': typing.Union[huffpre...
|
| __dataclass_fields__ = {'freq': Field(name='freq',type=<class 'int'>,d...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
class HuffTuple(builtins.object)
| HuffTuple(seq_term: str = '', total_freq: int = -1, node: Union[huffpress.huff.HuffNode.HuffNode, NoneType] = None) -> None
|
| seq_term = str
| total_freq = int,
| node = Optional[HuffNode]
|
| Similar structure to SortedTree where we have string term,
| total frequency, and the HuffNode (which could be null)
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, seq_term: str = '', total_freq: int = -1, node: Union[huffpress.huff.HuffNode.HuffNode, NoneType] = None) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'node': typing.Union[huffpress.huff.HuffNode.HuffNo...
|
| __dataclass_fields__ = {'node': Field(name='node',type=typing.Union[hu...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
|
| node = None
|
| seq_term = ''
|
| total_freq = -1
class InputData(builtins.object)
| InputData(data: Union[str, bytes]) -> None
|
| data = Union[str, bytes]
|
| Input data to be compressed will either be a string or sequence of bytes
| string e.g. "Hello"
| bytes e.g. b"ABC" or [65, 66, 67]
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, data: Union[str, bytes]) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'data': typing.Union[str, bytes]}
|
| __dataclass_fields__ = {'data': Field(name='data',type=typing.Union[st...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
class Leaves(builtins.object)
| Leaves(data: Dict[str, huffpress.huff.htypes.HuffTerm]) -> None
|
| data = Dict[str, HuffTerm]
|
| Initial set of leaves set as a dictionary of keys as the term made up of
| comma delimited ordinal ASCII values, and the value as the HuffTerm.
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, data: Dict[str, huffpress.huff.htypes.HuffTerm]) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'data': typing.Dict[str, huffpress.huff.htypes.Huff...
|
| __dataclass_fields__ = {'data': Field(name='data',type=typing.Dict[str...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
class SortedTree(builtins.object)
| SortedTree(data: List[huffpress.huff.htypes.HuffSeq]) -> None
|
| data = List[Tuple[str, HuffTerm]]
|
| Huffman tree structure, which is a list of tuples of the term made up of
| comma delimited ordinal ASCII values, and the HuffTerm. The list is sorted
| by the total number of frequency order in ascending order.
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, data: List[huffpress.huff.htypes.HuffSeq]) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'data': typing.List[huffpress.huff.htypes.HuffSeq]}
|
| __dataclass_fields__ = {'data': Field(name='data',type=typing.List[huf...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
class TermFreq(builtins.object)
| TermFreq(tf: Dict[int, int]) -> None
|
| tf = Dict[str, int]
|
| When calculating collections.Counter on a input string or bytes,
| we return a dictionary of key being the ordinal ASCII value, and
| the value being the frequency of occurrence in the input data.
|
| Methods defined here:
|
| __eq__(self, other)
|
| __init__(self, tf: Dict[int, int]) -> None
|
| __repr__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'tf': typing.Dict[int, int]}
|
| __dataclass_fields__ = {'tf': Field(name='tf',type=typing.Dict[int, in...
|
| __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
|
| __hash__ = None
DATA
Dict = typing.Dict
List = typing.List
Optional = typing.Optional
Union = typing.Union
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\huff\htypes.py
Huffman functions
help(huffpress.huff.hfunctions)
Help on module huffpress.huff.hfunctions in huffpress.huff:
NAME
huffpress.huff.hfunctions - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
hfunctions.py
Contains all Huffman building and deconstructing functions
FUNCTIONS
build_leaves(term_freq: huffpress.huff.htypes.TermFreq, verbose: bool = False) -> huffpress.huff.htypes.Leaves
build_leaves(term_freq: TermFreq,
verbose: bool = False) -> Leaves:
Builds initial leaf HuffNode's from a given dictionary of character
frequency occurrence counts
:param term_freq: dictionary of frequency occurrence counts of a given
string computed by calc_term_freq function
:param verbose: set to True for printing console outputs
:return: dictionary of leaf HuffNode's for a given character frequency
count dictionary
build_tree(sorted_new_tree: huffpress.huff.htypes.SortedTree, verbose: bool = False) -> Union[huffpress.huff.HuffNode.HuffNode, NoneType]
build_tree(sorted_new_tree: SortedTree,
verbose: bool = False) -> Optional[HuffNode]:
Builds Huffman tree made out of HuffNode's, constructed from initial
HuffNode leaves
:param sorted_new_tree: sorted [ term, (total-frequency, HuffNode) ]
:param verbose: set to True for printing console outputs
:return: Built Huffman tree from initial asc sorted list of leaves
HuffNode's computed by build_leaves function and sorted by
sort_tree function
calc_term_freq(data: huffpress.huff.htypes.InputData) -> huffpress.huff.htypes.TermFreq
calc_term_freq(data: InputData) -> TermFreq:
Returns dictionary of frequency occurrence counts for each character
of a given string
e.g. "ABBcCC" --> { "A": 1, "B": 2, "c": 1, "C": 2 }
:param data: input string text
:return: dictionary of character frequency occurrence counts
create_huff_tree(data, verbose: bool = False)
creates Huffman tree, calling either:
create_huff_tree(InputData, bool); or
create_huff_tree(TermFreq, bool)
:param data: InputData (str or bytes) or TermFreq term frequency counts
:param verbose: bool - verbose for printing
encode(data, tree: Union[huffpress.huff.HuffNode.HuffNode, NoneType], path: str = '', verbose: bool = False)
encode function calling either:
encode(int, HuffNode, str); or
encode(Leaves, HuffNode, bool)
:param verbose: bool for verbose printing
:param path: str for 1, 0 paths visited
:param data: either int (term) or Leaves (initial set of term leaves)
:param tree: Huffman tree
print_node(node: Union[huffpress.huff.HuffNode.HuffNode, NoneType], depth: int = 0, verbose: bool = True) -> str
print_node(node: HuffNode, depth: int = 0, verbose: bool = True) -> str:
Recursive printing of the HuffNode tree showing all branches, leaves and
their terms and total-frequencies
:param node: HuffNode tree i.e. Huffman tree
:param depth: How many whitespaces to print to represent depth level
(starting at depth 0)
:param verbose: set to True to print to console, False to return string
output
:return: None (prints Huffman tree to console)
sort_tree(tree: huffpress.huff.htypes.Leaves, verbose: bool = False) -> huffpress.huff.htypes.SortedTree
sort_tree(tree: Leaves, verbose: bool = False) -> SortedTree:
Sorts a Huffman tree dictionary by total frequency ascending order
returning a list
e.g.
{ "D": (4, HuffNode), "E": (3, HuffNode),
"F": (7, HuffNode), "ABC": (2, HuffNode) }
-->
[ ("ABC", (2, HuffNode)), ("E", (3, HuffNode)),
("D", (4, HuffNode)), ("F", (7, HuffNode)) ]
:param tree: dictionary of HuffNode's { term : (total-frequency, HuffNode) }
:param verbose: set to True for printing console outputs
:return: sorted dictionary of HuffNode's converted to a list
DATA
ItemsView = typing.ItemsView
List = typing.List
Optional = typing.Optional
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\huff\hfunctions.py
Huffman node class
help(huffpress.huff.HuffNode)
Help on module huffpress.huff.HuffNode in huffpress.huff:
NAME
huffpress.huff.HuffNode - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
HuffNode.py
Huffman tree class
CLASSES
builtins.object
HuffNode
class HuffNode(builtins.object)
| HuffNode(term: str, freq: int, left_child=None, right_child=None)
|
| A class representing a Huffman Binary tree (Huffman node)
|
| ...
|
| Attributes
| ----------
| term : str
| ordinal character terms i.e. ascii values delimited by comma
| freq : int
| total number of occurrences of this term
| left_child : HuffNode
| left child node / recursive left branch
| right_child : HuffNode
| right child node / recursive right branch
|
| Methods
| -------
| is_leaf():
| Returns True if leaf node, otherwise False
|
| Methods defined here:
|
| __init__(self, term: str, freq: int, left_child=None, right_child=None)
| __init__(self, term: str, freq: int, left_child=None, right_child=None):
|
| Constructs HuffNode with all necessary attributes
|
| :param term: (str) ordinal character terms i.e. ascii values
| delimited by comma
| :param freq: (int) total number of occurrences of this term
| :param left_child: (HuffNode) left child node / recursive left branch
| :param right_child: (HuffNode) right child node / recursive right branch
|
| ----------------------------------------------------------------------
| Readonly properties defined here:
|
| is_leaf
| @property
| def is_leaf(self) -> bool:
|
| Checks if current node is a leaf node
|
| :return: True if leaf, False otherwise
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\huff\huffnode.py
Auxiliary functions
BaseN for converting numbers back-forth to different base numbers
help(huffpress.auxi.basen)
Help on module huffpress.auxi.basen in huffpress.auxi:
NAME
huffpress.auxi.basen - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
BaseN.py
Contains numeric functionality converting back-forth decimal to a Base N
number. e.g. binary, hex, base 5, etc.
CLASSES
builtins.object
BaseRange
class BaseRange(builtins.object)
| BaseRange class holding the following static consts:
|
| -- dec: dict --
| decimal to base range i.e. {10: A, 11:B, ..., 35: Z} and rest of the
| decimal to base range {1: 1, 2:2, ..., 9:9, 10:A, ..., 35: Z}
|
| -- rev: dict --
|
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __annotations__ = {'dec': <class 'huffpress.auxi.imdict.ImDict'>, 'rev...
|
| dec = <huffpress.auxi.imdict.ImDict object>
|
| rev = <huffpress.auxi.imdict.ImDict object>
FUNCTIONS
basen(in_num, fbase: int = 10, tbase: int = 2, out_str: bool = False)
converts number in_num from base fbase to base tbase
see overloads below
:param in_num: number to convert: either str or List[str]
:param fbase: from base
:param tbase: to base
:param out_str: False, out = List[str]. True, out = str
:return: List[str] value conversion
nmod(x: int, y: int) -> str
BaseN modulo operator
nmod(10, 16) = "A"
nmod(10, 2) = "0"
:param x: number
:param y: modulo
:return: remainder
to_basen(num: int, base: int = 2) -> List[str]
to_basen(num: int) -> List[str]:
Convert decimal to binary list
:param base: base number. for binary, base = 2. for hex, base = 16
:param num: decimal number
:return: binary list of 1's 0's
to_dec(in_bin: List[str], base: int = 2) -> int
to_dec(in_bin: List[str]) -> int:
Convert binary list to decimal integer
:param base: base number. for binary, base = 2. for hex, base = 16
:param in_bin: binary list of 1's and 0's
:return: decimal integer converted from input binary
DATA
List = typing.List
Union = typing.Union
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\auxi\basen.py
BaseN examples converting numbers of any base to any base
from huffpress.auxi.basen import basen
help(basen)
Help on function basen in module huffpress.auxi.basen:
basen(in_num, fbase: int = 10, tbase: int = 2, out_str: bool = False)
converts number in_num from base fbase to base tbase
see overloads below
:param in_num: number to convert: either str or List[str]
:param fbase: from base
:param tbase: to base
:param out_str: False, out = List[str]. True, out = str
:return: List[str] value conversion
# Converting from number of a base to a different base
basen("1234", 10, 2) # converting from base 10 to 2 (binary)
['1', '0', '0', '1', '1', '0', '1', '0', '0', '1', '0']
basen("1234", 10, 2, True) # converting from base 10 to 2 (binary) output as one string
'10011010010'
basen("1234", 10, 16, True) # converting from base 10 to 16 (hex) output as one string
'4D2'
basen("Z", 36, 10, True) # converting from base 36 to 10 (dec) output as one string
'35'
basen("64FP", 27, 5, True) # converting from base 27 to 5 output as one string
'12341234'
# Example of compressing a string via base conversions of the ASCII values
hello = "Hello world"
hello_ascii = list(map(ord, list(hello)))
hello_ascii
[72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]
hello_ascii_join = ",".join(list(map(str, hello_ascii)))
print(hello_ascii_join)
print(f"Length = {len(hello_ascii_join)}")
72,101,108,108,111,32,119,111,114,108,100
Length = 41
hello_36 = list(map(lambda x: basen(str(x), 10, 36, True), hello_ascii))
hello_36
['20', '2T', '30', '30', '33', 'W', '3B', '33', '36', '30', '2S']
hello_36_join = ",".join(list(map(str, hello_36)))
print(hello_36_join)
print(f"Length = {len(hello_36_join)}")
20,2T,30,30,33,W,3B,33,36,30,2S
Length = 31
Compression modes
help(huffpress.auxi.modes)
Help on module huffpress.auxi.modes in huffpress.auxi:
NAME
huffpress.auxi.modes - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
modes.py
Contains all modes used in the rest of the codebase (currently we only
have 1 mode)
CLASSES
enum.Enum(builtins.object)
Mode
class Mode(enum.Enum)
| Mode(value, names=None, *, module=None, qualname=None, type=None, start=1)
|
| Compression modes
|
| 0 - Default (File or Raw input data)
| 1 - File compression only
| 2 - Raw input data compression only
|
| Method resolution order:
| Mode
| enum.Enum
| builtins.object
|
| Data and other attributes defined here:
|
| DEFAULT = <Mode.DEFAULT: 0>
|
| FILE = <Mode.FILE: 1>
|
| RAW = <Mode.RAW: 2>
|
| ----------------------------------------------------------------------
| Data descriptors inherited from enum.Enum:
|
| name
| The name of the Enum member.
|
| value
| The value of the Enum member.
|
| ----------------------------------------------------------------------
| Readonly properties inherited from enum.EnumMeta:
|
| __members__
| Returns a mapping of member name->value.
|
| This mapping lists all enum members, including aliases. Note that this
| is a read-only view of the internal mapping.
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\auxi\modes.py
Immutable Dictionary class
help(huffpress.auxi.imdict)
Help on module huffpress.auxi.imdict in huffpress.auxi:
NAME
huffpress.auxi.imdict - (c) 2021 Usman Ahmad https://github.com/selphaware
DESCRIPTION
imdict.py
Immutable dictionary
CLASSES
collections.abc.Mapping(collections.abc.Collection)
ImDict
class ImDict(collections.abc.Mapping)
| ImDict(*args, **kwargs)
|
| ImDict
|
| Immutable dictionary class, can be used as a normal dictionary but
| is immutable.
|
| Method resolution order:
| ImDict
| collections.abc.Mapping
| collections.abc.Collection
| collections.abc.Sized
| collections.abc.Iterable
| collections.abc.Container
| builtins.object
|
| Methods defined here:
|
| __getitem__(self, key)
|
| __hash__(self)
| Return hash(self).
|
| __init__(self, *args, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|
| __iter__(self)
|
| __len__(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __abstractmethods__ = frozenset()
|
| ----------------------------------------------------------------------
| Methods inherited from collections.abc.Mapping:
|
| __contains__(self, key)
|
| __eq__(self, other)
| Return self==value.
|
| get(self, key, default=None)
| D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
|
| items(self)
| D.items() -> a set-like object providing a view on D's items
|
| keys(self)
| D.keys() -> a set-like object providing a view on D's keys
|
| values(self)
| D.values() -> an object providing a view on D's values
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from collections.abc.Mapping:
|
| __reversed__ = None
|
| ----------------------------------------------------------------------
| Class methods inherited from collections.abc.Collection:
|
| __subclasshook__(C) from abc.ABCMeta
| Abstract classes can override this to customize issubclass().
|
| This is invoked early on by abc.ABCMeta.__subclasscheck__().
| It should return True, False or NotImplemented. If it returns
| NotImplemented, the normal algorithm is used. Otherwise, it
| overrides the normal algorithm (and the outcome is cached).
DATA
__warningregistry__ = {'version': 7}
FILE
c:\programdata\anaconda3\lib\site-packages\huffpress\auxi\imdict.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
huffpress-1.0.2.tar.gz
(48.5 kB
view hashes)
Built Distribution
huffpress-1.0.2-py3-none-any.whl
(29.2 kB
view hashes)
Close
Hashes for huffpress-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9641e54aae7249b4f3c1e6b345d78c0c8c78c5ebc95eac4228a35f7c64dac309 |
|
MD5 | 808702fe8def1fefb0fde4b105ddaf95 |
|
BLAKE2b-256 | 8ac4dc5d1c05ecaceb1dd3e9a7894038c64ae617e3f77adafe7c7f40c603045d |