hdiffpatch

Python wrapper around HDiffPatch C++ library for efficient binary diff/patch operations.

These details have not been verified by PyPI

Project links

Project description

hdiffpatch-python is a Python wrapper around the HDiffPatch C++ library, providing binary diff and patch operations with compression support.

Python compat

Installation

hdiffpatch requires Python >=3.9 and can be installed via:

pip install hdiffpatch

For development installation:

git clone https://github.com/BrianPugh/hdiffpatch-python.git
cd hdiffpatch-python
poetry install

Quick Start

hdiffpatch primarily provides 2 simple functions:

diff for creating a patch.
apply for applying a patch.

Basic Usage

import hdiffpatch

# Create binary data
old_data = b"Hello, world!"
new_data = b"Hello, HDiffPatch!"

# Create a diff
diff = hdiffpatch.diff(old_data, new_data)

# Apply the diff
result = hdiffpatch.apply(old_data, diff)
assert result == new_data

With Simple Compression

import hdiffpatch

old_data = b"Large binary data..." * 1000
new_data = b"Modified binary data..." * 1000

# Create a compressed diff
diff = hdiffpatch.diff(old_data, new_data, compression="zlib")

# Apply patch
result = hdiffpatch.apply(old_data, diff)
assert result == new_data

With Advanced Compression Configuration

import hdiffpatch

old_data = b"Large binary data..." * 1000
new_data = b"Modified binary data..." * 1000

# Use configuration classes for fine-grained control
config = hdiffpatch.ZlibConfig(level=9, window=12)

diff = hdiffpatch.diff(old_data, new_data, compression=config)
result = hdiffpatch.apply(old_data, diff)
assert result == new_data

Recompressing Diffs

import hdiffpatch

old_data = b"Large binary data..." * 1000
new_data = b"Modified binary data..." * 1000

# Create a diff with zlib compression
diff_zlib = hdiffpatch.diff(old_data, new_data, compression="zlib")

# Recompress the same diff with zstd
diff_zstd = hdiffpatch.recompress(diff_zlib, compression="zstd")

# Remove compression entirely
diff_uncompressed = hdiffpatch.recompress(diff_zlib, compression="none")

# Both diffs produce the same result when applied
result1 = hdiffpatch.apply(old_data, diff_zlib)
result2 = hdiffpatch.apply(old_data, diff_zstd)
assert result1 == result2 == new_data

API Reference

Core Functions

def diff(old_data, new_data, compression="none", *, validate=True) -> bytes

Create a binary diff between two byte sequences.

Parameters:

old_data (bytes): Original data.
new_data (bytes): Modified data.
compression (str or config object): Compression type as string ("none", "zlib", "lzma", "lzma2", "zstd", "bzip2", "tamp") or a compression configuration object.
validate (bool): Test that the patch successfully converts old_data to new_data. This is a computationally inexpensive operation. Defaults to True.

Returns: bytes - Binary diff data that can be used with apply() and old_data to generate new_data.

def apply(old_data, diff_data) -> bytes

Apply a binary patch to reconstruct new data.

Parameters:

old_data (bytes): Original data.
diff_data (bytes): Patch data from diff().

Returns: bytes - Reconstructed data. The new_data that was passed to diff().

def recompress(diff_data, compression=None) -> bytes

Recompress a diff with a different compression algorithm.

Parameters:

diff_data (bytes): The diff data to recompress.
compression (str or config object, optional): Target compression type as string ("none", "zlib", "lzma", "lzma2", "zstd", "bzip2", "tamp") or a compression configuration object. If None, removes compression.

Returns: bytes - The recompressed diff data

Compression Configuration

For advanced compression control, hdiffpatch provides configuration classes for each compression algorithm:

ZStdConfig

Fine-grained control over Zstandard compression:

# Basic configuration
config = hdiffpatch.ZStdConfig(level=15, window=20, workers=2)

# Preset configurations
config = hdiffpatch.ZStdConfig.fast()             # Optimized for speed
config = hdiffpatch.ZStdConfig.balanced()         # Balanced speed/compression
config = hdiffpatch.ZStdConfig.best_compression() # Maximum compression
config = hdiffpatch.ZStdConfig.minimal_memory()   # Minimal memory usage

# Use with diff
diff = hdiffpatch.diff(old_data, new_data, compression=config)

Parameters:

level (1-22): Compression level, higher = better compression
window (10-27): Window size as log2, larger = better compression
workers (0-200): Number of threads, 0 = single-threaded

ZlibConfig

Fine-grained control over zlib compression:

# Basic configuration
config = hdiffpatch.ZlibConfig(
    level=9,
    memory_level=8,
    window=15,
    strategy=hdiffpatch.ZlibStrategy.DEFAULT
)

# Preset configurations
config = hdiffpatch.ZlibConfig.fast()
config = hdiffpatch.ZlibConfig.balanced()
config = hdiffpatch.ZlibConfig.best_compression()
config = hdiffpatch.ZlibConfig.minimal_memory()
config = hdiffpatch.ZlibConfig.png_optimized()    # Optimized for PNG-like data

Parameters:

level (0-9): Compression level
memory_level (1-9): Memory usage level
window (9-15): Window size as power of 2
strategy: Compression strategy (DEFAULT, FILTERED, HUFFMAN_ONLY, RLE, FIXED)

LzmaConfig and Lzma2Config

Fine-grained control over LZMA compression:

# LZMA configuration
config = hdiffpatch.LzmaConfig(level=9, window=23, thread_num=1)

# LZMA2 configuration (supports more threads)
config = hdiffpatch.Lzma2Config(level=9, window=23, thread_num=4)

# Preset configurations available for both
config = hdiffpatch.LzmaConfig.fast()
config = hdiffpatch.LzmaConfig.balanced()
config = hdiffpatch.LzmaConfig.best_compression()
config = hdiffpatch.LzmaConfig.minimal_memory()

Parameters:

level (0-9): Compression level
window (12-30): Window size as log2
thread_num: Number of threads (1-2 for LZMA, 1-64 for LZMA2)

BZip2Config

Fine-grained control over bzip2 compression:

config = hdiffpatch.BZip2Config(level=9, work_factor=30)

# Preset configurations
config = hdiffpatch.BZip2Config.fast()
config = hdiffpatch.BZip2Config.balanced()
config = hdiffpatch.BZip2Config.best_compression()
config = hdiffpatch.BZip2Config.minimal_memory()

Parameters:

level (1-9): Compression level
work_factor (0-250): Work factor for worst-case scenarios

TampConfig

Fine-grained control over Tamp compression (embedded-friendly):

config = hdiffpatch.TampConfig(window=10)

# Preset configurations
config = hdiffpatch.TampConfig.fast()
config = hdiffpatch.TampConfig.balanced()
config = hdiffpatch.TampConfig.best_compression()
config = hdiffpatch.TampConfig.minimal_memory()

Parameters:

window (8-15): Window size as power of 2

Exceptions

hdiffpatch.HDiffPatchError

Compression Performance

Different compression algorithms offer trade-offs between compression ratio and speed:

zlib: Good balance of speed and compression. Very common.
zstd: Fast compression with good ratios.
lzma/lzma2: Very high compression ratios, slower.
bzip2: Good compression, moderate speed
tamp: Embedded-friendly compression, minimal memory usage.

Basic Compression Comparison

import hdiffpatch

# Large repetitive data
old_data = b"A" * 10000 + b"B" * 10000
new_data = b"A" * 10000 + b"C" * 10000

# Compare compression effectiveness
for compression in ["none", "zlib", "zstd", "lzma", "bzip2", "tamp"]:
    diff = hdiffpatch.diff(old_data, new_data, compression=compression)
    print(f"{compression}: {len(diff)} bytes")

Advanced Configuration Comparison

import hdiffpatch

# Compare different configuration approaches
configs = {
    "zstd_fast": hdiffpatch.ZStdConfig.fast(),
    "zstd_best": hdiffpatch.ZStdConfig.best_compression(),
    "zlib_balanced": hdiffpatch.ZlibConfig.balanced(),
    "lzma2_custom": hdiffpatch.Lzma2Config(level=6, window=20, thread_num=4),
}

for name, config in configs.items():
    diff = hdiffpatch.diff(old_data, new_data, compression=config)
    print(f"{name}: {len(diff)} bytes")

Real-World Example: MicroPython Firmware

Here's a comprehensive comparison using actual MicroPython firmware files with a 12-bit window size (4096 bytes). This window size was chosen because it is typically a good trade-off between memory-usage and compression-performance for embedded targets.

RPI_PICO-20241129-v1.24.1.uf2: 651 KB
RPI_PICO-20250415-v1.25.0.uf2: 652 KB

Since we're using compression for the diff, a natural question would be: "If I'm adding a decompression library to my target project, then how much smaller is the patch compared to just compressing the firmware?" To answer this question, we compare the size of the compressed patch to the compressed firmware.

Algorithm	Size (HDiffPatch)	Size (firmware)	Improvement
none	209.7 KB	652.0 KB	3.11x
tamp	143.1 KB	322.8 KB	2.26x
zstd	133.4 KB	277.6 KB	2.08x
zlib	125.5 KB	251.8 KB	2.01x
bzip2	128.6 KB	246.2 KB	1.91x
lzma	116.9 KB	222.7 KB	1.91x

In this example, using hdiffpatch resulted in a ~3x smaller update when compared to a naive uncompressed firmware update, and ~2x smaller when comparing against an equivalently-compressed firmware update.

To reproduce these results:

poetry run python tools/micropython-binary-demo.py

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Jul 16, 2025

This version

0.1.0

Jul 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdiffpatch-0.1.0.tar.gz (9.0 MB view details)

Uploaded Jul 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hdiffpatch-0.1.0-cp311-cp311-macosx_15_0_arm64.whl (10.0 MB view details)

Uploaded Jul 16, 2025 CPython 3.11macOS 15.0+ ARM64

File details

Details for the file hdiffpatch-0.1.0.tar.gz.

File metadata

Download URL: hdiffpatch-0.1.0.tar.gz
Upload date: Jul 16, 2025
Size: 9.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.11.4 Darwin/24.5.0

File hashes

Hashes for hdiffpatch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fad65c9a350157a2e558ed2ece9f08b2d4896c838d2c430e61578c7fa8487015`
MD5	`450a7bd3fdd960b8187c83a6b0af317d`
BLAKE2b-256	`109e53da33f6dab3f4cbed2d2be45b9fdfaffa90761c617b04a76a91e21f276a`

See more details on using hashes here.

File details

Details for the file hdiffpatch-0.1.0-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

Download URL: hdiffpatch-0.1.0-cp311-cp311-macosx_15_0_arm64.whl
Upload date: Jul 16, 2025
Size: 10.0 MB
Tags: CPython 3.11, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.11.4 Darwin/24.5.0

File hashes

Hashes for hdiffpatch-0.1.0-cp311-cp311-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`0daf6ab98b12339c5410903323a3fd6129727104c7ce28d93b06b1e915fceee3`
MD5	`2cdd90d1d8ef6b433ab0dacd452a96ca`
BLAKE2b-256	`7df0b8dcc3f2ebf8ec6fe6875d36f4a776c811a1473cdf06392f0771c68c27a3`

See more details on using hashes here.

hdiffpatch 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Quick Start

Basic Usage

With Simple Compression

With Advanced Compression Configuration

Recompressing Diffs

API Reference

Core Functions

Compression Configuration

ZStdConfig

ZlibConfig

LzmaConfig and Lzma2Config

BZip2Config

TampConfig

Exceptions

Compression Performance

Basic Compression Comparison

Advanced Configuration Comparison

Real-World Example: MicroPython Firmware

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes