Skip to main content

Python implementation of block decomposition method for approximating algorithmic complexity.

Project description

https://badge.fury.io/py/pybdm.png https://travis-ci.org/sztal/pybdm.svg?branch=master https://codecov.io/gh/sztal/pybdm/branch/master/graph/badge.svg?branch=master Documentation Status

The Block Decomposition Method (BDM) approximates algorithmic complexity of a dataset of arbitrary size, that is, the length of the shortest computer program that generates it. This is not trivial as algorithmic complexity is not a computable quantity in the general case and estimation of algorithmic complexity of a dataset can be very useful as it points to mechanistic connections between elements of a system, even such that do not yield any regular statistical patterns that can be captured with more traditional tools based on probability theory and information theory.

Currently 1D and 2D binary arrays are supported, as well as 1D arrays with 4, 5, 6 and 9 discrete symbols.

BDM and the necessary parts of the algorithmic information theory it is based on are described in this article.

See the official documentation for more information.

Installation

Standard installation (stable):

pip install pybdm

Development version installation:

pip install git+https://github.com/sztal/pybdm.git

Local development:

git clone https://github.com/sztal/pybdm
cd pybdm
pip install --editable .

Supported versions

Python3.5+ is supported. Tests are run against Linux, but Windows and OSX should work as well.

Usage

The BDM is implemented using the object-oriented approach and expects input represented as Numpy arrays of integer type.

BDM objects operate exclusively on integer arrays. Hence, any alphabet must be first mapped to a set of integers ranging from 0 to k.

Detailed usage examples can be found in the official documentation.

Binary sequences (1D)

import numpy as np
from pybdm import BDM

# Create a dataset (must be of integer type)
X = np.ones((100,), dtype=int)

# Initialize BDM object
# ndim argument specifies dimensionality of BDM
bdm = BDM(ndim=1)

# Compute BDM
bdm.bdm(X)

# BDM objects may also compute standard Shannon entropy in base 2
bdm.ent(X)

Binary matrices (2D)

import numpy as np
from pybdm import BDM

# Create a dataset (must be of integer type)
X = np.ones((100, 100), dtype=int)

# Initialize BDM object
bdm = BDM(ndim=2)

# Compute BDM
bdm.bdm(X)

# BDM objects may also compute standard Shannon entropy in base 2
bdm.ent(X)

Non-binary sequences (1D)

import numpy as np
from pybdm import BDM

# Create a dataset (4 discrete symbols)
np.random.seed(303)
X = np.random.randint(0, 4, (100,))

# Initialize BDM object with 4-symbols alphabet
bdm = BDM(ndim=1, nsymbols=4)

# Compute BDM
bdm.bdm(X)

Parallel processing

PyBDM was designed with parallel processing in mind. Using modern packages for parallelization such as joblib makes it really easy to compute BDM for massive objects.

In this example we will slice a 1000x1000 dataset into 200x200 pieces compute so-called counter objects (final BDM computation operates on such objects) in parallel in 4 independent processes, and aggregate the results into a single BDM approximation of the algorithmic complexity of the dataset.

Remember that data has to be sliced correctly during parallelization in order to ensure fully correct BDM computations. That is, all slices except lower and right boundaries have to be decomposable without any boundary leftovers by the selected decomposition algorithm.

import numpy as np
from joblib import Parallel, delayed
from pybdm import BDM
from pybdm.utils import decompose_dataset

# Create a dataset (must be of integer type)
X = np.ones((1000, 1000), dtype=int)

# Initialize BDM object
bdm = BDM(ndim=2)

# Compute counter objects in parallel
counters = Parallel(n_jobs=4) \
    (delayed(bdm.decompose_and_count)(d) for d in decompose_dataset(X, (200, 200)))

# Compute BDM
bdm.compute_bdm(*counters)

Perturbation analysis

Besides the main Block Decomposition Method implementation PyBDM provides also an efficient algorithm for perturbation analysis based on BDM (or standard Shannon entropy).

A perturbation experiment studies change of BDM / entropy under changes applied to the underlying dataset. This is the main tool for detecting parts of a system having some causal significance as opposed to noise parts.

Parts which after yield negative contribution to the overall complexity after change are likely to be important for the system, since their removal make it more noisy. On the other hand parts that yield positive contribution to the overall complexity after change are likely to be noise since they extend the system’s description length.

import numpy as np
from pybdm import BDM
from pybdm.algorithms import PerturbationExperiment

# Create a dataset (must be of integer type)
X = np.ones((100, 100), dtype=int)

# Initialize BDM object
bdm = BDM(ndim=2)

# Initialize perturbation experiment object
# (may be run for both bdm or entropy)
perturbation = PerturbationExperiment(bdm, X, metric='bdm')

# Compute BDM change for all data points
delta_bdm = perturbation.run()

# Compute BDM change for selected data points and keep the changes while running
# One array provide indices of elements that are to be change.
idx = np.array([[0, 0], [10, 10]], dtype=int)
# Another array provide new values to assign.
# Negative values mean that new values will be selected
# randomly from the set of other possible values from the alphabet.
values = np.array([-1, -1], dtype=int)
delta_bdm = perturbation.run(idx, values, keep_changes=True)

Authors & Contact

Documentation

The full documentation is at http://pybdm-docs.rtfd.org.

History

0.1.0 (2019-09-22)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybdm-0.1.0.tar.gz (39.9 MB view details)

Uploaded Source

Built Distribution

pybdm-0.1.0-py2.py3-none-any.whl (39.9 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file pybdm-0.1.0.tar.gz.

File metadata

  • Download URL: pybdm-0.1.0.tar.gz
  • Upload date:
  • Size: 39.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.3

File hashes

Hashes for pybdm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16d4a3b12ddd506bc7d94b41e19da53e4d448f6670384c41e5c24cf2f71b622c
MD5 b15b64aa1a5b077e345195fafcca837b
BLAKE2b-256 ba63cf3fee7f8a86b1f289ded34340fa2c8c2fd0ca6a231dada486002a17495b

See more details on using hashes here.

File details

Details for the file pybdm-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pybdm-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 39.9 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.3

File hashes

Hashes for pybdm-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 28cec2b7263f4448ef9d62ae71bf84d1450b83d8a160a69779be0022959ce3c1
MD5 9d181c541f5a56286c93ce3f2e38e58f
BLAKE2b-256 b732fd74c068e07e1b91a732982125bd821af9fb144cd741d16ac0a607564538

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page