Huffman compression implemented in pure Python.
Project description
Compression Tool (Huffman Coding in Pure Python)
A lightweight, fully tested Huffman compression library with both byte-level and file-level APIs.
This project implements a complete Huffman compression pipeline from scratch, including:
- Frequency counting
- Min-heap construction
- Huffman tree generation
- Code map building
- Bit packing/unpacking
- Header encoding/decoding
- High-level compression & decompression
- File-based compressor and decompressor
- Full test suite (unit + integration)
The goal is to provide a clear, modular reference implementation that is easy to study, extend, and reuse โ including on a future demo page.
โจ Features
- ๐ง Pure Python implementation
- ๐งช Fully tested with pytest
- ๐ File compression support
- ๐ Round-trip safe
- ๐ฆ Simple API:
compress_bytes(data: bytes) -> bytesdecompress_bytes(data: bytes) -> bytescompress_file(path) -> CompressionResultdecompress_file(path) -> DecompressionResult
- ๐งฉ Modular internal structure
- ๐ฆ Published as a PyPI package
๐ฆ Installation
Install from PyPI:
pip install jv-compression-tool
Then:
import compression_tool
๐งฉ Usage
1. Compressing bytes
from compression_tool import compress_bytes, decompress_bytes
data = b"hello world"
compressed = compress_bytes(data)
restored = decompress_bytes(compressed)
assert restored == data
2. Compressing files
from compression_tool import compress_file, decompress_file
result = compress_file("example.txt")
print("Original size:", result.original_size)
print("Compressed size:", result.compressed_size)
decomp = decompress_file(result.output_path)
assert decomp.output_path.read_bytes() == Path("example.txt").read_bytes()
The file APIs return simple dataclasses:
CompressionResultDecompressionResult
๐งฑ Project Structure
The project uses a src/ layout for clean packaging:
jv_compression_tool/
โโโ pyproject.toml
โโโ README.md
โโโ LICENSE
โโโ src/
โ โโโ compression_tool/
โ โโโ __init__.py
โ โโโ compressor.py
โ โโโ decompressor.py
โ โโโ file_compressor.py
โ โโโ file_decompressor.py
โ โโโ frequency.py
โ โโโ tree.py
โ โโโ build_tree.py
โ โโโ code_map.py
โ โโโ lookup.py
โ โโโ header.py
โ โโโ utils/
โ โโโ heapify.py
โ โโโ bitutils.py
โโโ tests/
โโโ test_header.py
โโโ test_heapify.py
โโโ test_build_tree.py
โโโ test_compressor.py
โโโ test_decompressor.py
โโโ test_file_io.py
โโโ data/
โโโ test.txt
(Test file names are illustrative โ your exact structure may differ.)
๐ Header Format
This project currently uses a simple text-based header:
HUF1|pad=<pad_len>|freq=symbol:weight,...|
HUF1โ Magic string + version tagpadโ Number of padding bits added to the final bytefreqโ A comma-separated table of<symbol>:<weight>- Symbols use their integer byte value (e.g.
104=b"h")
Example
HUF1|pad=3|freq=104:1,101:1,108:3,111:2|
A more compact binary header format may be introduced later.
๐ Development
To set up a local development environment:
git clone https://github.com/JesseVahlfors/jv_compression_tool.git
cd jv_compression_tool
python -m venv .venv
.\.venv\Scripts\activate # or: source .venv/Scripts/activate
pip install -r requirements.txt
pip install -e .
๐งช Testing
Run all tests:
pytest
Run fast tests only (skipping slow ones):
pytest -m "not slow"
Slow tests (e.g., using a large โLes Misรฉrablesโ file) are marked:
@pytest.mark.slow
@pytest.mark.skipif(not RUN_SLOW_TESTS, reason="Slow test disabled")
๐ Performance Notes
Huffman compression works best when:
- Input is large
- Symbol distribution is uneven
- Repetition exists in the data
Small files may grow slightly due to header overhead โ this is expected.
๐บ Future Improvements
- CLI tool (e.g.,
huff compress file.txt) - Binary header format
- Streaming compression
- Web/demo integration
๐ License
Licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jv_compression_tool-0.1.2.tar.gz.
File metadata
- Download URL: jv_compression_tool-0.1.2.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c644310b9a280400022fe6b534d64022e6249be40a84849056dc85efac3593ef
|
|
| MD5 |
d09f0fff180fc8f653f3a02ea0f8ad35
|
|
| BLAKE2b-256 |
c0f8b2bc692a2ffc2a80ff0d9f0ccad03c4506abe8b6bc9819304ec4ba7fc05a
|
File details
Details for the file jv_compression_tool-0.1.2-py3-none-any.whl.
File metadata
- Download URL: jv_compression_tool-0.1.2-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7dc1aec81cf697c09093b0b706fcf970de3cd7f69da82fd3871419ac439a6a6
|
|
| MD5 |
a593f84f4297f96a1817738d5bc7e686
|
|
| BLAKE2b-256 |
f596c5efb449d7fbd1f00937b175d9f2bfa9ebff99aebd1c2aa8365bfe697819
|