Python wrapper around HDiffPatch C++ library for efficient binary diff/patch operations.
Project description
hdiffpatch-python is a Python wrapper around the HDiffPatch C++ library, providing binary diff and patch operations with compression support.
Installation
hdiffpatch requires Python >=3.9 and can be installed via:
pip install hdiffpatch
For development installation:
git clone https://github.com/BrianPugh/hdiffpatch-python.git
cd hdiffpatch-python
poetry install
Quick Start
hdiffpatch primarily provides 2 simple functions:
difffor creating a patch.applyfor applying a patch.
Basic Usage
import hdiffpatch
# Create binary data
old_data = b"Hello, world!"
new_data = b"Hello, HDiffPatch!"
# Create a diff
diff = hdiffpatch.diff(old_data, new_data)
# Apply the diff
result = hdiffpatch.apply(old_data, diff)
assert result == new_data
With Simple Compression
import hdiffpatch
old_data = b"Large binary data..." * 1000
new_data = b"Modified binary data..." * 1000
# Create a compressed diff
diff = hdiffpatch.diff(old_data, new_data, compression="zlib")
# Apply patch
result = hdiffpatch.apply(old_data, diff)
assert result == new_data
With Advanced Compression Configuration
import hdiffpatch
old_data = b"Large binary data..." * 1000
new_data = b"Modified binary data..." * 1000
# Use configuration classes for fine-grained control
config = hdiffpatch.ZlibConfig(level=9, window=12)
diff = hdiffpatch.diff(old_data, new_data, compression=config)
result = hdiffpatch.apply(old_data, diff)
assert result == new_data
Recompressing Diffs
import hdiffpatch
old_data = b"Large binary data..." * 1000
new_data = b"Modified binary data..." * 1000
# Create a diff with zlib compression
diff_zlib = hdiffpatch.diff(old_data, new_data, compression="zlib")
# Recompress the same diff with zstd
diff_zstd = hdiffpatch.recompress(diff_zlib, compression="zstd")
# Remove compression entirely
diff_uncompressed = hdiffpatch.recompress(diff_zlib, compression="none")
# Both diffs produce the same result when applied
result1 = hdiffpatch.apply(old_data, diff_zlib)
result2 = hdiffpatch.apply(old_data, diff_zstd)
assert result1 == result2 == new_data
API Reference
Core Functions
def diff(old_data, new_data, compression="none", *, validate=True) -> bytes
Create a binary diff between two byte sequences.
Parameters:
old_data(bytes): Original data.new_data(bytes): Modified data.compression(str or config object): Compression type as string ("none","zlib","lzma","lzma2","zstd","bzip2","tamp") or a compression configuration object.validate(bool): Test that the patch successfully convertsold_datatonew_data. This is a computationally inexpensive operation. Defaults toTrue.
Returns: bytes - Binary diff data that can be used with apply() and old_data to generate new_data.
def apply(old_data, diff_data) -> bytes
Apply a binary patch to reconstruct new data.
Parameters:
old_data(bytes): Original data.diff_data(bytes): Patch data fromdiff().
Returns: bytes - Reconstructed data. The new_data that was passed to diff().
def recompress(diff_data, compression=None) -> bytes
Recompress a diff with a different compression algorithm.
Parameters:
diff_data(bytes): The diff data to recompress.compression(str or config object, optional): Target compression type as string ("none","zlib","lzma","lzma2","zstd","bzip2","tamp") or a compression configuration object. If None, removes compression.
Returns: bytes - The recompressed diff data
Compression Configuration
For advanced compression control, hdiffpatch provides configuration classes for each compression algorithm:
ZStdConfig
Fine-grained control over Zstandard compression:
# Basic configuration
config = hdiffpatch.ZStdConfig(level=15, window=20, workers=2)
# Preset configurations
config = hdiffpatch.ZStdConfig.fast() # Optimized for speed
config = hdiffpatch.ZStdConfig.balanced() # Balanced speed/compression
config = hdiffpatch.ZStdConfig.best_compression() # Maximum compression
config = hdiffpatch.ZStdConfig.minimal_memory() # Minimal memory usage
# Use with diff
diff = hdiffpatch.diff(old_data, new_data, compression=config)
Parameters:
level(1-22): Compression level, higher = better compressionwindow(10-27): Window size as log2, larger = better compressionworkers(0-200): Number of threads, 0 = single-threaded
ZlibConfig
Fine-grained control over zlib compression:
# Basic configuration
config = hdiffpatch.ZlibConfig(
level=9,
memory_level=8,
window=15,
strategy=hdiffpatch.ZlibStrategy.DEFAULT
)
# Preset configurations
config = hdiffpatch.ZlibConfig.fast()
config = hdiffpatch.ZlibConfig.balanced()
config = hdiffpatch.ZlibConfig.best_compression()
config = hdiffpatch.ZlibConfig.minimal_memory()
config = hdiffpatch.ZlibConfig.png_optimized() # Optimized for PNG-like data
Parameters:
level(0-9): Compression levelmemory_level(1-9): Memory usage levelwindow(9-15): Window size as power of 2strategy: Compression strategy (DEFAULT,FILTERED,HUFFMAN_ONLY,RLE,FIXED)
LzmaConfig and Lzma2Config
Fine-grained control over LZMA compression:
# LZMA configuration
config = hdiffpatch.LzmaConfig(level=9, window=23, thread_num=1)
# LZMA2 configuration (supports more threads)
config = hdiffpatch.Lzma2Config(level=9, window=23, thread_num=4)
# Preset configurations available for both
config = hdiffpatch.LzmaConfig.fast()
config = hdiffpatch.LzmaConfig.balanced()
config = hdiffpatch.LzmaConfig.best_compression()
config = hdiffpatch.LzmaConfig.minimal_memory()
Parameters:
level(0-9): Compression levelwindow(12-30): Window size as log2thread_num: Number of threads (1-2 for LZMA, 1-64 for LZMA2)
BZip2Config
Fine-grained control over bzip2 compression:
config = hdiffpatch.BZip2Config(level=9, work_factor=30)
# Preset configurations
config = hdiffpatch.BZip2Config.fast()
config = hdiffpatch.BZip2Config.balanced()
config = hdiffpatch.BZip2Config.best_compression()
config = hdiffpatch.BZip2Config.minimal_memory()
Parameters:
level(1-9): Compression levelwork_factor(0-250): Work factor for worst-case scenarios
TampConfig
Fine-grained control over Tamp compression (embedded-friendly):
config = hdiffpatch.TampConfig(window=10)
# Preset configurations
config = hdiffpatch.TampConfig.fast()
config = hdiffpatch.TampConfig.balanced()
config = hdiffpatch.TampConfig.best_compression()
config = hdiffpatch.TampConfig.minimal_memory()
Parameters:
window(8-15): Window size as power of 2
Exceptions
hdiffpatch.HDiffPatchError
Compression Performance
Different compression algorithms offer trade-offs between compression ratio and speed:
zlib: Good balance of speed and compression. Very common.zstd: Fast compression with good ratios.lzma/lzma2: Very high compression ratios, slower.bzip2: Good compression, moderate speedtamp: Embedded-friendly compression, minimal memory usage.
Basic Compression Comparison
import hdiffpatch
# Large repetitive data
old_data = b"A" * 10000 + b"B" * 10000
new_data = b"A" * 10000 + b"C" * 10000
# Compare compression effectiveness
for compression in ["none", "zlib", "zstd", "lzma", "bzip2", "tamp"]:
diff = hdiffpatch.diff(old_data, new_data, compression=compression)
print(f"{compression}: {len(diff)} bytes")
Advanced Configuration Comparison
import hdiffpatch
# Compare different configuration approaches
configs = {
"zstd_fast": hdiffpatch.ZStdConfig.fast(),
"zstd_best": hdiffpatch.ZStdConfig.best_compression(),
"zlib_balanced": hdiffpatch.ZlibConfig.balanced(),
"lzma2_custom": hdiffpatch.Lzma2Config(level=6, window=20, thread_num=4),
}
for name, config in configs.items():
diff = hdiffpatch.diff(old_data, new_data, compression=config)
print(f"{name}: {len(diff)} bytes")
Real-World Example: MicroPython Firmware
Here's a comprehensive comparison using actual MicroPython firmware files with a 12-bit window size (4096 bytes). This window size was chosen because it is typically a good trade-off between memory-usage and compression-performance for embedded targets.
- RPI_PICO-20241129-v1.24.1.uf2: 651 KB
- RPI_PICO-20250415-v1.25.0.uf2: 652 KB
Since we're using compression for the diff, a natural question would be: "If I'm adding a decompression library to my target project, then how much smaller is the patch compared to just compressing the firmware?" To answer this question, we compare the size of the compressed patch to the compressed firmware.
| Algorithm | Size (HDiffPatch) | Size (firmware) | Improvement |
|---|---|---|---|
| none | 209.7 KB | 652.0 KB | 3.11x |
| tamp | 143.1 KB | 322.8 KB | 2.26x |
| zstd | 133.4 KB | 277.6 KB | 2.08x |
| zlib | 125.5 KB | 251.8 KB | 2.01x |
| bzip2 | 128.6 KB | 246.2 KB | 1.91x |
| lzma | 116.9 KB | 222.7 KB | 1.91x |
In this example, using hdiffpatch resulted in a ~3x smaller update when compared to a naive uncompressed firmware update, and ~2x smaller when comparing against an equivalently-compressed firmware update.
To reproduce these results:
poetry run python tools/micropython-binary-demo.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hdiffpatch-0.1.0.tar.gz.
File metadata
- Download URL: hdiffpatch-0.1.0.tar.gz
- Upload date:
- Size: 9.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.11.4 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fad65c9a350157a2e558ed2ece9f08b2d4896c838d2c430e61578c7fa8487015
|
|
| MD5 |
450a7bd3fdd960b8187c83a6b0af317d
|
|
| BLAKE2b-256 |
109e53da33f6dab3f4cbed2d2be45b9fdfaffa90761c617b04a76a91e21f276a
|
File details
Details for the file hdiffpatch-0.1.0-cp311-cp311-macosx_15_0_arm64.whl.
File metadata
- Download URL: hdiffpatch-0.1.0-cp311-cp311-macosx_15_0_arm64.whl
- Upload date:
- Size: 10.0 MB
- Tags: CPython 3.11, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.11.4 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0daf6ab98b12339c5410903323a3fd6129727104c7ce28d93b06b1e915fceee3
|
|
| MD5 |
2cdd90d1d8ef6b433ab0dacd452a96ca
|
|
| BLAKE2b-256 |
7df0b8dcc3f2ebf8ec6fe6875d36f4a776c811a1473cdf06392f0771c68c27a3
|