Skip to main content

Reversible adjacent XOR differencing transform algo

Project description

XOR ∆

A reversible adjacent XOR differencing transform algo

"To compress, perchance to save..."

Synopsis

xor-delta is a small experimental Python package that explores XOR-adjacent delta encoding as a preprocessing transform for compression.

It answers a very specific question:

Does XORing adjacent values reduce entropy in a way that helps real compressors?

  • Short answer: maybe 🤷🏼‍♂️
  • Long answer: Step into my office...

What is XOR-delta?

For Those in a Hurry...

The core transform used by this project is:

A_i^{(k+1)} = A_i^{(k)} \oplus A_{i+1}^{(k)} \quad \forall i

Where:

  • $A^{(k)}$ is the original sequence at step $k$
  • $\oplus$ denotes bitwise XOR
  • One endpoint value (the anchor) is stored to make the transform reversible

How Does It Work?

Given a sequence of values:

v0, v1, v2, v3, ...

XOR-delta encoding stores:

  • one anchor value (first or last)
  • a list of XORs between adjacent values:
v0 ^ v1, v1 ^ v2, v2 ^ v3, ...

This transform is:

  • lossless
  • reversible
  • cheap
  • not compression by itself

It’s a preprocessing step you can feed into standard compressors like zlib, bz2, or lzma.

Installation

pip install xor-delta

Or

git clone https://GitHub.com/DJStompZone/xor_delta
cd xor_delta
pip install . # use `--with=dev` if you plan to run tests

Python API

Integer sequences

from xor_delta import xor_delta_encode_ints, xor_delta_decode_ints

data = [10, 11, 12, 13]

encoded = xor_delta_encode_ints(data)
decoded = xor_delta_decode_ints(encoded)

assert decoded == data

Byte sequences

from xor_delta import xor_delta_encode_bytes, xor_delta_decode_bytes

data = b"hello world"

anchor, diffs, side = xor_delta_encode_bytes(data)
restored = xor_delta_decode_bytes(anchor, diffs, side)

assert restored == data

CLI Benchmark Tool

xor-delta ships with a benchmarking CLI that compares compression before and after XOR-delta.

Run the default benchmark (Shakespeare)

xor-delta-bench

This automatically downloads Shakespeare from Project Gutenberg, caches it locally, and benchmarks:

  • raw bytes
  • XOR-adjacent bytes

using:

  • zlib
  • bz2
  • lzma
corpus_cache/pg100.txt.<hash>
  RAW      raw=5,638,525  zlib=2,138,296 (0.379x)  bz2=1,586,908 (0.281x)  lzma=1,673,804 (0.297x)
  XOR      raw=5,638,525  zlib=2,546,436 (0.452x)  bz2=1,708,046 (0.303x)  lzma=1,890,440 (0.335x)
  xor-vs-raw  zlib +19.09%   bz2 +7.63%   lzma +12.94%

Interpretation:
XOR-delta made compression worse for English text across all tested compressors.

That’s the point — we measured it instead of guessing.

Benchmark your own files

xor-delta-bench myfile.bin
xor-delta-bench mydir/

Use a Gutenberg preset

xor-delta-bench --gutenberg shakespeare
# Feel free to send a PR if you want more presets <3

Or any URL

xor-delta-bench --gutenberg-url https://example.com/text.txt

Downloads are cached in corpus_cache/.


When does XOR-delta help?

XOR-adjacent transforms can help when:

  • data has small local variation
  • values are structured, not textual
  • adjacent samples are correlated

Examples:

  • counters
  • timestamps
  • some sensor streams
  • monotonic-ish numeric data

It can hurt when:

  • data is already high-entropy
  • compressors already exploit structure better (text + LZ)
  • XOR destroys symbol locality

Development

Run tests:

pytest
# Or if you're using Poetry
poetry run pytest

License

MIT

Credits

Created by DJ Stomp https://github.com/DJStompZone/xor_delta

Inspired by spectcow's original description of the algorithm, full credit for the core concept goes to them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xor_delta-1.0.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xor_delta-1.0.0-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file xor_delta-1.0.0.tar.gz.

File metadata

  • Download URL: xor_delta-1.0.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.9 Linux/6.2.1-PRoot-Distro

File hashes

Hashes for xor_delta-1.0.0.tar.gz
Algorithm Hash digest
SHA256 09a10f8dd5c8981d5ddabad3ec63143d2592ed13f2c0664d47d90596d423ddc3
MD5 f7ec43153791e2df35b3a5b46e3c61ec
BLAKE2b-256 f867f8d73c017e8b9d4a371cb53e1f9f4f7f3e83d789d6064d47e760375d3367

See more details on using hashes here.

File details

Details for the file xor_delta-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: xor_delta-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.9 Linux/6.2.1-PRoot-Distro

File hashes

Hashes for xor_delta-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa71ac91199af9ab54f87500df449bdaa4e30d2379abc87532c9e43c02c55df2
MD5 fefc1b5d08c938de565394faef73b4ba
BLAKE2b-256 d9eee5b2b6d44703bfd1d04961d58ad22508fb0b0860dbbca961ccebf13e5672

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page