Skip to main content

William: A tool for data compression and machine learning automatization

Project description

pipeline status PyPI Python Version

WILLIAM - A general purpose data compression algorithm

Overview

WILLIAM is an inductive programming system based on the theory of Incremental Compression (IC) [Franz et al. 2021]. Its core principle is that learning = compression:
given a dataset x, the algorithm searches for short descriptions in the form of compositional features f1, f2, …, fs such that

x = f1(f2(... f_s(r_s)))

with each step achieving some compression. This corresponds to an incremental approximation of the Kolmogorov complexity K(x):

K(x) ≈ Σ l(f*i) + K(r_s) + O(s · log l(x))

where each f*i is the shortest compressing feature at step i.

WILLIAM differs from classical ML approaches in that it does not optimize parameters in a fixed representation, but searches a broad algorithmic space for compressing autoencoders.
This yields machine learning algorithms (centralization, regression, classification, decision trees, outlier detection) as emergent special cases of general compression:contentReference[oaicite:0]{index=0}.

For theoretical background, see:

  • A Theory of Incremental Compression (Franz, Antonenko, Soletskyi, 2021):contentReference[oaicite:1]{index=1}
  • WILLIAM: A Monolithic Approach to AGI (Franz, Gogulya, Löffler, 2019)
  • Experiments on the Generalization of Machine Learning Algorithms (Franz, 2020):contentReference[oaicite:2]{index=2}

Key Concepts

  • Incremental Compression
    Decomposes data into features and residuals step by step, ensuring that each feature is independent and incompressible.

  • Features as Properties
    Features formalize algorithmic properties of data and can be related to Martin-Löf randomness tests:
    non-random regularities correspond to compressible features.

  • Universality
    Unlike specialized ML algorithms, WILLIAM discovers short descriptions exhaustively via directed acyclic graphs (DAGs) of operators, reusing values and cutting at information bottlenecks.

  • Emergent ML Algorithms
    Without any tuning, WILLIAM naturally rediscovers:

    • data centralization
    • outlier detection
    • linear regression
    • linear classification
    • decision tree induction:contentReference[oaicite:3]{index=3}

Limitations and Future Work

Overhead accumulation: IC theory implies additive overhead terms.

Alternative descriptions: currently only one compression path is explored at a time.

Reuse of functions: theory of memory/retrieval still open.

Performance: the Python prototype handles graphs of depth 4–5; C++/Rust backend and parallelization are natural next steps.

Despite these challenges, IC theory provides guarantees: incremental compression reaches Kolmogorov complexity up to logarithmic precision

Installation

For a standard installation, use:

pip install william-occam

For a full installation of all dependencies for further development, testing and graphical output use:

pip install william-occam[dev]

Compression examples

You can run various compression tests directly with pytest. Set

export WILLIAM_DEBUG=3

to get visual output after every compression step. Set to 2, if you only want to see the compression results after every task. Now run:

py.test -v -s william/tests/test_alice.py

Enter c and enter to step through the steps with the debugger and look at the generated graphs.

During execution, WILLIAM will:

  • Generate synthetic training data for several regression problems:
  • Search for a minimal program (tree/DAG) that explains the data.
  • Display the compression progress (how the description length decreases).
  • Render the resulting Directed Acyclic Graphs (DAGs) as PDF files in your working directory.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). You are free to use, share, and modify the code for non-commercial purposes only, with proper attribution to the original author. For full license details, see the LICENSE.md file.

Releasing

Releases are published automatically when a tag is pushed to GitLab.

# Example for version 1.2.3
export RELEASE=v1.2.3

# Create a tag and push the specific tag to trigger the CI pipeline
git tag $RELEASE && git push origin $RELEASE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

william_occam-0.2.3.1-cp314-cp314-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

william_occam-0.2.3.1-cp313-cp313-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

william_occam-0.2.3.1-cp312-cp312-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

william_occam-0.2.3.1-cp311-cp311-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

File details

Details for the file william_occam-0.2.3.1-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: william_occam-0.2.3.1-cp314-cp314-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.14, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for william_occam-0.2.3.1-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e07103a9d54adcc04bf21a240e24b88f8d848843b3f6f913ae5e386bb3d6a66e
MD5 3fe671991afe31c120b66db09b94d755
BLAKE2b-256 8b070f5bc317861a9d04ca41b5be6d84a78c3f1f0bb2b6072641976649542e2e

See more details on using hashes here.

File details

Details for the file william_occam-0.2.3.1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: william_occam-0.2.3.1-cp313-cp313-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for william_occam-0.2.3.1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 735e69a962d673207b69185b4211fed8db12526441315861ac5ed1a9d075927f
MD5 7635289ffe19f7dc43a22f2eef9f6a84
BLAKE2b-256 ee1fc752403358b7dbbdbb8bac607419b4e17c03e0d8429f18e25d75f2e769c2

See more details on using hashes here.

File details

Details for the file william_occam-0.2.3.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: william_occam-0.2.3.1-cp312-cp312-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for william_occam-0.2.3.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 07b5d4bdcfde6b373cb27334d82eec9bdaf7416e16099f9930181113004252a5
MD5 cdf7d6510d0fcc9ae771b4d9256e5852
BLAKE2b-256 e08237b72fdc52d23357fca5a3309260f58bdcfecaabe859a58ffa8786b41471

See more details on using hashes here.

File details

Details for the file william_occam-0.2.3.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: william_occam-0.2.3.1-cp311-cp311-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for william_occam-0.2.3.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 425589801a4dfc00798077356ea9d8836d1af1a6d713c0800d2b0640bdedf6fa
MD5 4a70b7d2b50da82a708c26623cbaeee9
BLAKE2b-256 87731ef5a794a23c4da1c926559a46427b3e643327e58d916cb33127e6ac9665

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page