Skip to main content

Huffman compression implemented in pure Python.

Project description

Compression Tool (Huffman Coding in Pure Python)

A lightweight, fully tested Huffman compression library with both byte-level and file-level APIs.

This project implements a complete Huffman compression pipeline from scratch, including:

  • Frequency counting
  • Min-heap construction
  • Huffman tree generation
  • Code map building
  • Bit packing/unpacking
  • Header encoding/decoding
  • High-level compression & decompression
  • File-based compressor and decompressor
  • Full test suite (unit + integration)

The goal is to provide a clear, modular reference implementation that is easy to study, extend, and reuse โ€” including on a future demo page.


โœจ Features

  • ๐Ÿ”ง Pure Python implementation
  • ๐Ÿงช Fully tested with pytest
  • ๐Ÿ“„ File compression support
  • ๐Ÿ” Round-trip safe
  • ๐Ÿ“ฆ Simple API:
    • compress_bytes(data: bytes) -> bytes
    • decompress_bytes(data: bytes) -> bytes
    • compress_file(path) -> CompressionResult
    • decompress_file(path) -> DecompressionResult
  • ๐Ÿงฉ Modular internal structure
  • ๐Ÿ“ฆ Published as a PyPI package

๐Ÿ“ฆ Installation

Install from PyPI:

pip install jv-compression-tool

Then:

import compression_tool

๐Ÿงฉ Usage

1. Compressing bytes

from compression_tool import compress_bytes, decompress_bytes

data = b"hello world"
compressed = compress_bytes(data)
restored = decompress_bytes(compressed)

assert restored == data

2. Compressing files

from compression_tool import compress_file, decompress_file

result = compress_file("example.txt")

print("Original size:", result.original_size)
print("Compressed size:", result.compressed_size)

decomp = decompress_file(result.output_path)
assert decomp.output_path.read_bytes() == Path("example.txt").read_bytes()

The file APIs return simple dataclasses:

  • CompressionResult
  • DecompressionResult

๐Ÿงฑ Project Structure

The project uses a src/ layout for clean packaging:

jv_compression_tool/
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ compression_tool/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ compressor.py
โ”‚       โ”œโ”€โ”€ decompressor.py
โ”‚       โ”œโ”€โ”€ file_compressor.py
โ”‚       โ”œโ”€โ”€ file_decompressor.py
โ”‚       โ”œโ”€โ”€ frequency.py
โ”‚       โ”œโ”€โ”€ tree.py
โ”‚       โ”œโ”€โ”€ build_tree.py
โ”‚       โ”œโ”€โ”€ code_map.py
โ”‚       โ”œโ”€โ”€ lookup.py
โ”‚       โ”œโ”€โ”€ header.py
โ”‚       โ””โ”€โ”€ utils/
โ”‚           โ”œโ”€โ”€ heapify.py
โ”‚           โ””โ”€โ”€ bitutils.py
โ””โ”€โ”€ tests/
    โ”œโ”€โ”€ test_header.py
    โ”œโ”€โ”€ test_heapify.py
    โ”œโ”€โ”€ test_build_tree.py
    โ”œโ”€โ”€ test_compressor.py
    โ”œโ”€โ”€ test_decompressor.py
    โ”œโ”€โ”€ test_file_io.py
    โ””โ”€โ”€ data/
        โ””โ”€โ”€ test.txt

(Test file names are illustrative โ€” your exact structure may differ.)


๐Ÿ” Header Format

This project currently uses a simple text-based header:

HUF1|pad=<pad_len>|freq=symbol:weight,...|
  • HUF1 โ€“ Magic string + version tag
  • pad โ€“ Number of padding bits added to the final byte
  • freq โ€“ A comma-separated table of <symbol>:<weight>
  • Symbols use their integer byte value (e.g. 104 = b"h")

Example

HUF1|pad=3|freq=104:1,101:1,108:3,111:2|

A more compact binary header format may be introduced later.


๐Ÿ›  Development

To set up a local development environment:

git clone https://github.com/JesseVahlfors/jv_compression_tool.git
cd jv_compression_tool

python -m venv .venv
.\.venv\Scripts\activate          # or: source .venv/Scripts/activate

pip install -r requirements.txt
pip install -e .

๐Ÿงช Testing

Run all tests:

pytest

Run fast tests only (skipping slow ones):

pytest -m "not slow"

Slow tests (e.g., using a large โ€œLes Misรฉrablesโ€ file) are marked:

@pytest.mark.slow
@pytest.mark.skipif(not RUN_SLOW_TESTS, reason="Slow test disabled")

๐Ÿ“ˆ Performance Notes

Huffman compression works best when:

  • Input is large
  • Symbol distribution is uneven
  • Repetition exists in the data

Small files may grow slightly due to header overhead โ€” this is expected.


๐Ÿ—บ Future Improvements

  • CLI tool (e.g., huff compress file.txt)
  • Binary header format
  • Streaming compression
  • Web/demo integration

๐Ÿ“œ License

Licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jv_compression_tool-0.1.2.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jv_compression_tool-0.1.2-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file jv_compression_tool-0.1.2.tar.gz.

File metadata

  • Download URL: jv_compression_tool-0.1.2.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for jv_compression_tool-0.1.2.tar.gz
Algorithm Hash digest
SHA256 c644310b9a280400022fe6b534d64022e6249be40a84849056dc85efac3593ef
MD5 d09f0fff180fc8f653f3a02ea0f8ad35
BLAKE2b-256 c0f8b2bc692a2ffc2a80ff0d9f0ccad03c4506abe8b6bc9819304ec4ba7fc05a

See more details on using hashes here.

File details

Details for the file jv_compression_tool-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for jv_compression_tool-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e7dc1aec81cf697c09093b0b706fcf970de3cd7f69da82fd3871419ac439a6a6
MD5 a593f84f4297f96a1817738d5bc7e686
BLAKE2b-256 f596c5efb449d7fbd1f00937b175d9f2bfa9ebff99aebd1c2aa8365bfe697819

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page