Skip to main content

Python implementation of VCDIFF (RFC 3284) delta compression format

Project description

VCDIFF Python Decoder

A Python implementation of a VCDIFF (RFC 3284) decoder library and command-line tool for efficient binary differencing and compression.

Overview

The library provides a VCDIFF decoder that can decode delta files created according to RFC 3284 - The VCDIFF Generic Differencing and Compression Data Format. VCDIFF is a format for expressing one data stream as a variant of another data stream, commonly used for binary differencing, compression, and patch applications.

Features

  • Python Library: RFC 3284 compliant VCDIFF decoding with clean, Pythonic API
  • Command-Line Tool: Apply deltas and inspect VCDIFF file structure
  • Comprehensive Validation: Support for all VCDIFF instruction types (ADD, COPY, RUN)
  • Address Caching: Efficient decoding with proper address cache implementation
  • Checksum Validation: Full Adler-32 checksum validation support
  • Robust Error Handling: Detailed error messages for debugging malformed files
  • Extensive Testing: 85 test cases with reference implementation validation
  • Type Safety: Full type hints for better development experience

Limitations

  • Application Headers: This implementation does not handle application header information
  • Secondary Compression: This decoder does not support secondary compression (e.g., gzip, bzip2)
  • Compatibility: Works with VCDIFF deltas created using xdelta3 -e -S -A (no secondary compression, no application header)

Checksum Support

  • VCD_ADLER32: This implementation detects and parses the VCD_ADLER32 extension (bit 0x04 in window indicator)
  • Non-standard Extension: The Adler-32 checksum is not part of RFC 3284 but is supported by some implementations
  • Validation: Full Adler-32 checksum validation is implemented and performed during decoding
  • Display: Checksums are displayed in the CLI output as Adler32: 0x########

Installation

pip install vcdiff

Development Installation

poetry install

Cloning with Test Suite

This repository includes the VCDIFF test suite as a git submodule. To clone the repository with all test cases:

git clone --recursive https://github.com/ably/vcdiff-py.git

If you've already cloned the repository without the submodule, initialize it:

git submodule update --init --recursive

To update the test suite submodule to the latest version:

git submodule update --remote

Quick Start

Library Usage

import vcdiff

# Read the source file
with open("original.txt", "rb") as f:
    source = f.read()

# Read the VCDIFF delta file
with open("changes.vcdiff", "rb") as f:
    delta_data = f.read()

# Apply the delta to reconstruct the target
try:
    result = vcdiff.decode(source, delta_data)
    print(f"Decoded result: {result}")
except vcdiff.VCDIFFError as e:
    print(f"Decoding failed: {e}")

API Reference

Core Functions

vcdiff.decode(source: bytes, delta: bytes) -> bytes

Decodes a VCDIFF delta file using the provided source data and returns the reconstructed target data.

Parameters:

  • source: The original source data (may be empty for deltas that don't reference source)
  • delta: The VCDIFF delta file data

Returns:

  • Decoded target data as bytes

Raises:

  • VCDIFFError: If decoding fails (malformed delta, checksum validation failure, etc.)

vcdiff.Decoder(source: bytes)

Creates a new decoder instance with the specified source data. Useful for decoding multiple deltas against the same source.

Parameters:

  • source: The source data for decoding operations

Returns:

  • A Decoder instance that can be used to decode multiple deltas

decoder.decode(delta: bytes) -> bytes

Decodes a single VCDIFF delta using the decoder's source data.

Error Handling

The decoder provides detailed error messages for various failure conditions:

  • Invalid VCDIFF format or magic bytes
  • Malformed varint encoding
  • Out-of-bounds memory access attempts
  • Checksum validation failures
  • Truncated or corrupted delta files

Exception Hierarchy:

  • VCDIFFError: Base exception for all VCDIFF-related errors
  • InvalidMagicError: Invalid VCDIFF magic bytes
  • InvalidVersionError: Unsupported VCDIFF version
  • InvalidFormatError: Malformed delta structure
  • CorruptedDataError: Data corruption detected
  • InvalidChecksumError: Checksum validation failure

Testing

Prerequisites

For comprehensive testing, this project uses xdelta3 as a reference implementation to verify the correctness of the decoder.

Installing xdelta3

macOS (Homebrew)
brew install xdelta
Linux (Ubuntu/Debian)
sudo apt-get install xdelta3

Running Tests

To run the Python unit tests:

poetry run pytest

To run tests with coverage:

poetry run pytest --cov=vcdiff tests/

To run the comprehensive test suite against the VCDIFF test cases (requires submodule):

poetry run pytest tests/test_vcdiff.py -v

The test suite includes:

  • 20 general positive tests: Valid VCDIFF files that should decode successfully
  • 33 targeted negative tests: Invalid VCDIFF files that should be rejected with appropriate errors
  • 32 targeted positive tests: Specific feature validation tests
  • Total: 85 test cases with 100% pass rate

Test Results

  • General Positive Tests: 20/20 passed ✅
  • Targeted Negative Tests: 33/33 passed ✅
  • Targeted Positive Tests: 32/32 passed ✅

Contributing

Contributions are welcomed. Please follow these guidelines:

Getting Started

  1. Fork the repository
  2. Clone your fork with submodules: git clone --recursive <your-fork-url>
  3. Create a feature branch: git checkout -b feature/your-feature-name
  4. Make your changes
  5. Test your changes thoroughly
  6. Submit a pull request

Development Guidelines

  • Code Style: Follow PEP 8 formatting (black . for automatic formatting)
  • Type Hints: All new code should include proper type annotations
  • Testing: All new features must include tests
  • Documentation: Update documentation for any API changes
  • Commits: Use clear, descriptive commit messages

Before Submitting

Ensure your contribution passes all checks:

# Run all tests
pytest

# Format code
black .

# Lint code
flake8 vcdiff/

# Type checking
mypy vcdiff/

Reporting Issues

When reporting bugs, please include:

  • Python version
  • Operating system
  • Minimal reproduction case
  • Expected vs actual behavior
  • Sample VCDIFF files (if applicable)

Feature Requests

For new features, please:

  • Check existing issues first
  • Describe the use case
  • Provide RFC 3284 references if applicable
  • Consider backwards compatibility

Requirements

  • Python: 3.8 or higher
  • Development Dependencies:
    • pytest>=7.0.0
    • pytest-cov>=4.0.0
    • black>=22.0.0
    • flake8>=5.0.0
    • mypy>=1.0.0

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcdiff_decoder-0.1.0a1.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcdiff_decoder-0.1.0a1-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file vcdiff_decoder-0.1.0a1.tar.gz.

File metadata

  • Download URL: vcdiff_decoder-0.1.0a1.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.9.6 Darwin/24.5.0

File hashes

Hashes for vcdiff_decoder-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 734a181bae80ad9b44570e0905106dffc847e2351f8c8ffe7cc58c5c79c7890b
MD5 fa98c3cbc8c6399e75ddb8363adbee09
BLAKE2b-256 44744b0dd51a3ee4f38ba32470d315a033ba3ed91bb5b9da2d895599a5e2ab9d

See more details on using hashes here.

File details

Details for the file vcdiff_decoder-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: vcdiff_decoder-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.9.6 Darwin/24.5.0

File hashes

Hashes for vcdiff_decoder-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 5d33ac9d2e7dfc141bcbf75b59a31f3b9e9cce59e92169995831f128be63d87d
MD5 00b8f3d804c13042ddca0732d15a352e
BLAKE2b-256 6639324428a0b6261bb407f3ccbd79635b4fcb812dcc0e1df0a7bebed0a924ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page