A python interface for Hordcoin.
Project description
pyHordcoin - Python interface for Hordcoin.jl
This is a python interface for the Julia package Hordcoin.jl which provides methods for finding probability distributions with maximal Shannon entropy given a fixed marginal distribution or entropy up to a chosen order, and to compute the Connected Information.
Installing
Installation is easy, just:
- (optional but very advised) create and activate a new virtual environment:
pyhton -m venv hordcoin source hordcoin/bin/activate
- change directory to the unpacked repository:
cd /path/to/pyHordcoin
- install the package:
- if you only want to use its components for your measures:
pip install pyHordcoin
- if you want to contribute to the development download the sources and then, from the unpacked folder, run:
pip install -e .[dev]
- if you only want to use its components for your measures:
Running
The first time you import the package it will take some time to install the Julia dependencies (and possibly Julia itself), it might take a few minutes. From the second import on it's much faster, although the import of Julia still takes some time. This will be repaid during the maximisation of the entropies.
Usage
This package offers an interface to Hordcoin.jl which implements methods that maximize the Shannon entropy of a probability distribution with marginal distribution or entropic constraints and compute the Connected Information.
The input data must satisfy the following requirements:
- The probability distributions are stored as multidimensional Numpy arrays;
- Probabilities are non-negative and sum up to 1;
- OR are provided as (non-normalised) counts;
- The maximal order of the fixed marginal distributions has to be in [2, n-1], where n is the number of dimensions of the probability distribution.
Connected Information
The main function of the package is ConnectedInformation that uses the the maximum entropy with constraints at different orders to compute the Connected Information. It takes as input the probability distribution or the (non-normalised) counts, along with the desired orders of Connected Information and the optimisation method.
When computing multiple Connected Information values for the same probability distribution, it is possible to pass the sizes (desired orders) as an array. This will speed up the process by chaining the computations, thereby reducing the number of maximizations.
If no method is passed, the kind of optimisation is decided by the data type of the input:
intinput triggers constraints on the marginal entropy and theRawPolymatroidmethod;floatinput triggers constraints on the marginal distributions and theIpfpmethod.
It is possible to have complete control on the kind of constraints by passing a method explicitly:
Gradient,ConeandIpfptrigger constraints on the marginal distributions with bothintandfloatinputs;RawPolymatroidandGPolymatroidrequire constraints on the marginal entropy. However,GPolymatroidwill raise an error if used withfloatinputs as it needs the counts to compute the correction.
It's possible to pass a precomputed dictionary of entropies to speed up the computation when using entropic constraints. See note below on the structure of this dictionary.
The basic usage of ConnectedInformation is the following:
import pyHordcoin as hc
import numpy as np
counts=np.array([[[1, 2],[3, 4]], [[4, 2], [1, 3]]], dtype=int)
hc.ConnectedInformation(counts, 2)
Which will optimise (maximize entropy) constraining the marginal entropies (up to order 2) and should give a result similar to {2: 0.09310598013744764}
Notably, the following operations all give the same results:
hc.ConnectedInformation(counts, [2])
hc.ConnectedInformation(counts, 2, hc.RawPolymatroid())
hc.frequencies = counts.astype(float) ./ sum(counts)
hc.ConnectedInformation(frequencies, 2, hc.RawPolymatroid())
Alternatively, it's possible to trigger the marginal distribution constraints with these equivalent lines:
hc.ConnectedInformation(frequencies, 2)
hc.ConnectedInformation(counts, 2, hc.Ipfp())
hc.ConnectedInformation(frequencies, [2], hc.Ipfp())
Or similar results with:
hc.ConnectedInformation(frequencies, 2, hc.Gradient())
hc.ConnectedInformation(frequencies, 2, hc.Cone())
hc.ConnectedInformation(frequencies, 2, hc.Cone(hc.SCS()))
hc.ConnectedInformation(frequencies, 2, hc.Cone(hc.Mosek()))
Where the last one requires a Mosek license. (Academic licence easy to obtain at https://www.mosek.com/products/academic-licenses/).
Other useful parameters for the Polymatroid methods are:
- zhang_yeung: to enable the Zhang-Yeung inequalities complementing the Shannon inequalities and improving the approximation at higher orders (see paper),
- optimizer: to chose between the
SCSand theMosekoptimiser - mle_correction: (only
RawPolymatroid) enables a rough correction for the finite sample - tolerance: (only
GPolymatroid) enables a relaxation of the constraints to help convergence (sometimes if fails with the corrected entropies). Note: CI estimate can become negative due to the relaxed constraints.
Other functions
This interface currently implements a single function to access the entropy maximisation. The function MaximiseEntropy works for marginal constraints entropic constraints selecting the appropriate set of constraints using the same rules as ConnectedInformation (with the exception that fixed entropy maximisation from a normalised distribution is not allowed). MaximiseEntropy takes as an input a probability distribution and the order of marginal distributions to constrain (or the order up to which the marginal entropies must be fixed). The optimiser is an optional parameter that can have further specified parameters (such as the number of iterations, etc.). The function returns the maximum entropy and, in case of fixed marginals, the probability distribution with maximal entropy as an np.ndarray. It's possible to pass a precomputed dictionary of entropies to speed up the computation when using entropic constraints.
The basic usage is the following:
import pyHordcoin as hc
probability_distribution = np.array([[[1/16, 3/16], [3/16, 1/16]], [[1/16, 3/16], [3/16, 1/16]]])
marginal_size = 2
hc.MaximiseEntropy(probability_distribution, marginal_size)
Running the code with the optional parameter method:
hc.MaximiseEntropy(probability_distribution, marginal_size, method = hc.Gradient(10, hc.SCS()))
The package also contains one utility function: DistributionEntropy computes the information entropy of a probability distribution.
Usage of the functions:
hc.DistributionEntropy(probability_distribution)
NOTES
-
All the entropies are measured in bits. This applies also to the precalculated entropies provided by the user.
-
The precalculated entropies must be stored in a dictionary whose keys are tuples of dimensions indexed from 1. These are the dimensions kept in the marginal from which the entropy is calculated. In a 4-D distribution, suppose I sum away dimensions 2 and 4 (
marginal=distribution.sum((2,4,))), then the dictionary will contain{(1,3,):-np.sum(marginal*np.log2(marginal))}
A full example with precalculated entropies may look like this:
from itertools import combinations
A = np.random.randint(1000, size=[2, 2, 2]).astype(np.float64)
A /= A.sum()
marginal_entropies = {}
for i in range(3):
for a in combinations(range(3), i):
m = tuple(set(range(3)) - set(a))
tmp = A.sum(a)
k = tuple(b + 1 for b in m)
marginal_entropies[k] = hc.DistributionEntropy(tmp)
hc.ConnectedInformation(
A, 2, hc.RawPolymatroid(), precalculated_entropies=marginal_entropies
)
Recommendations
The most efficient method when computing with fixed marginal distributions is the Cone method with Mosek optimiser. This requires a license to use the MOSEK solver. Without the license, it is possible to use SCS instead, but it is less accurate and slower.
Without a MOSEK license, use the Ipfp method (default). It is accurate and not the slowest. It can also be parametrized with the number of iterations, but it is not necessary. The default value is 10.
The Gradient method is the slowest and may fail during execution due to limitations of Second Order Cone constraints in solvers.
When computing with fixed entropies and a small number of samples, the recommended method is the GPolymatroid with Mosek. When the distribution is sampled enough, you can use RawPolymatroid to estimate the entropy with the plug-in estimator. More information can be found in the paper.
Reading the documentation
If I didn't upload it somewhere else, from the pyHordcoin folder in the cloned repository run:
pip install -r docs/requirements.txt
to get the right packages.
Now you can read the documentation running:
mkdocs serve
And opening your browser at: http://127.0.0.1:8000/. Notice however that there isn't anything you can't find already in the sources.
How to cite
If you use this code for a scientific publication, please cite:
Tani Raffaelli G., Kislinger J., Kroupa T., and Hlinka J., "HORDCOIN: A Software Library for Higher Order Connected Information and Entropic Constraints Approximation"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyhordcoin-1.1.9.tar.gz.
File metadata
- Download URL: pyhordcoin-1.1.9.tar.gz
- Upload date:
- Size: 24.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4c77fd96e756148179e8fe8467c8a1f9b4ead17e56b68d65b34e26f8a7cd349
|
|
| MD5 |
a15044bc180087581c825921abd49803
|
|
| BLAKE2b-256 |
120f902d6ad2f20c8bbd0e96ed6f7aaeaa67ee817e35ccfc911a0f05aca1a952
|
File details
Details for the file pyhordcoin-1.1.9-py3-none-any.whl.
File metadata
- Download URL: pyhordcoin-1.1.9-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d7cac931c6784e36389a5841b1519fd11ea207e710ea7195a79013f4be58d04
|
|
| MD5 |
a686f6195892e390352f38976c79fa21
|
|
| BLAKE2b-256 |
17ccc8a6d300012263c365176fe8c448c7833deb2bcd32e401b9817abf5b2290
|