Skip to main content

William: A tool for data compression and machine learning automatization

Project description

pipeline status

WILLIAM - A general purpose data compression algorithm

Overview

WILLIAM is an inductive programming system based on the theory of Incremental Compression (IC) [Franz et al. 2021]. Its core principle is that learning = compression:
given a dataset x, the algorithm searches for short descriptions in the form of compositional features f1, f2, …, fs such that

x = f1(f2(... f_s(r_s)))

with each step achieving some compression. This corresponds to an incremental approximation of the Kolmogorov complexity K(x):

K(x) ≈ Σ l(f*i) + K(r_s) + O(s · log l(x))

where each f*i is the shortest compressing feature at step i.

WILLIAM differs from classical ML approaches in that it does not optimize parameters in a fixed representation, but searches a broad algorithmic space for compressing autoencoders.
This yields machine learning algorithms (centralization, regression, classification, decision trees, outlier detection) as emergent special cases of general compression:contentReference[oaicite:0]{index=0}.

For theoretical background, see:

  • A Theory of Incremental Compression (Franz, Antonenko, Soletskyi, 2021):contentReference[oaicite:1]{index=1}
  • WILLIAM: A Monolithic Approach to AGI (Franz, Gogulya, Löffler, 2019)
  • Experiments on the Generalization of Machine Learning Algorithms (Franz, 2020):contentReference[oaicite:2]{index=2}

Key Concepts

  • Incremental Compression
    Decomposes data into features and residuals step by step, ensuring that each feature is independent and incompressible.

  • Features as Properties
    Features formalize algorithmic properties of data and can be related to Martin-Löf randomness tests:
    non-random regularities correspond to compressible features.

  • Universality
    Unlike specialized ML algorithms, WILLIAM discovers short descriptions exhaustively via directed acyclic graphs (DAGs) of operators, reusing values and cutting at information bottlenecks.

  • Emergent ML Algorithms
    Without any tuning, WILLIAM naturally rediscovers:

    • data centralization
    • outlier detection
    • linear regression
    • linear classification
    • decision tree induction:contentReference[oaicite:3]{index=3}

Limitations and Future Work

Overhead accumulation: IC theory implies additive overhead terms.

Alternative descriptions: currently only one compression path is explored at a time.

Reuse of functions: theory of memory/retrieval still open.

Performance: the Python prototype handles graphs of depth 4–5; C++/Rust backend and parallelization are natural next steps.

Despite these challenges, IC theory provides guarantees: incremental compression reaches Kolmogorov complexity up to logarithmic precision

Installation

For a standard installation, use:

pip install william-occam

For a full installation of all dependencies for further development, testing and graphical output use:

pip install william-occam[dev]

Compression examples

You can run various compression tests directly with pytest. Set

export WILLIAM_DEBUG=3

to get visual output after every compression step. Set to 2, if you only want to see the compression results after every task. Now run:

py.test -v -s william/tests/test_alice.py

Enter c and enter to step through the steps with the debugger and look at the generated graphs.

During execution, WILLIAM will:

  • Generate synthetic training data for several regression problems:
  • Search for a minimal program (tree/DAG) that explains the data.
  • Display the compression progress (how the description length decreases).
  • Render the resulting Directed Acyclic Graphs (DAGs) as PDF files in your working directory.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). You are free to use, share, and modify the code for non-commercial purposes only, with proper attribution to the original author. For full license details, see the LICENSE.md file.

Releasing

Releases are published automatically when a tag is pushed to GitLab.

# Example for version 1.2.3
export RELEASE=v1.2.3

# Create an annotated tag
git tag -a $RELEASE

# Push the specific tag to trigger the CI pipeline
git push origin $RELEASE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

william_occam-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

william_occam-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

william_occam-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

william_occam-0.2.2-cp311-cp311-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

File details

Details for the file william_occam-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: william_occam-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.14, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for william_occam-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e5dcbc8948d08e092f5a7930ee4171fb05124e8d94ff23a28b1937379086e3d7
MD5 d8e4ca0134c7b075aa9bcb9eba61b4b6
BLAKE2b-256 d4eb8aa039b5a5c4ab5dab8eb54682839bd224e8902639e746e8fc6604dcaf3b

See more details on using hashes here.

File details

Details for the file william_occam-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: william_occam-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for william_occam-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 db364418c880cf2c6af313358cc1cc916e9233aa1e4128fa28d50659694bb276
MD5 458bc41d7a85c8fa6214f028de6d23e9
BLAKE2b-256 804c1f50bf23c57277ef497f9a1bb9b74411a4c85b7db7245817899df770dba8

See more details on using hashes here.

File details

Details for the file william_occam-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: william_occam-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for william_occam-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2d90c779316b880e84d510f621fee620ce333c17380ace555602c9a14ea66be5
MD5 4658ba0e881c73362706950cb6346cd0
BLAKE2b-256 aa470a79d1d146d03be9a339ffa0079761a57a1d264b51069a66bfd9d8689b3d

See more details on using hashes here.

File details

Details for the file william_occam-0.2.2-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: william_occam-0.2.2-cp311-cp311-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for william_occam-0.2.2-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 398e37729a52763a743152b2141ff4702a110d27cd2865c5fd3977a03012e8cb
MD5 013259df917e5be4d36eec2bec5a7c68
BLAKE2b-256 9b7ddcdf32ef39f7099291688f6c23ce048e4779257703cb14eaf0cb46b2238e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page