Calculate the relationship between parameters with known distributions.
Project description
Distrel
Distrel (distribution relations) is a package used to approximate the relationship between dependent distributions when only the properties of the distribution are known.
Example use case
Suppose we have 3 parameters with known distributions, such that: $a \sim \mathcal{N}(\mu_a, \sigma_a)
$, $b \sim \mathcal{N}(\mu_b, \sigma_b)
$ and $c \sim \mathcal{N}(\mu_c, \sigma_c)
$ and distributions are related by $c = a \times b$.
Distrel attempts to solve the question, how do we correctly pseudo-randomly sample from $a$ and $b$ such then $c$ follows the correct distribution.
How it works
Distrel is a wrapper around a simple neural network written in pytorch that finds the function $f_{\mu}(a)
$ and $f_{\sigma}(a)
$ from:
$b \sim \mathcal{N}(f_{\mu}(a), f_{\sigma}(a))
$
such that:
$a \times b \sim \mathcal{N}(\mu_c, \sigma_c)
$
That is, Distrel assumes that $a$ and $b$ are weakly related and hence samples $b$ from a normal distribution defined by a particular value of $a$.
Below is a diagram of the distrel neural network architecture:
Installation
With poetry
poetry add distrel
or with pip
pip install distrel
Usage
To make things interesting, let's consider the relationship $a \times b = c$ but each distribution is a unique non-central t-distribution. The key thing to note is that non-central t-distributions are defined by their degrees of freedom and a non-centrality parameter. We could use any distribution but currently distributions can only be characterised by their mean, variance, skewness and kurtosis - if more properties are required then please do open an issue.
from scipy.stats import nct
import distrel
# Defines the degrees of freedom (df) and non-centrality parameter (nc) for each distribution
a_df, a_nc = 9, 52
b_df, b_nc = 4, 30
c_df, c_nc = 143, 90
The distribution relation network (drn) needs three things: - A generator function to generate the input data. - A calculator function to obtain the 3rd distribution from the output of the generator and the output of the neural network. - The properties of each of the 3 distributions.
# We need to define a generator function that can generate samples of A.
# It must take N (the number of samples as it's only argument.
# It doesn't matter that gen_a doesn't return a torch tensor, this is accounted for internally.
def gen_a(N):
return nct.rvs(a_df, a_nc, size=N)
# We also need to define a function that calculates $c$ from $a$ and $b$.
# That is, the neural network will take input $a$, generate $b$ and then calculate $c$ from $a$ and
# $b$.
def calc_c(a, b):
return a * b
# Now we define the distribution relation network
drn = distrel.drn(gen=gen_a, calc=calc_c, seed=42)
# We need to define the distribution properties with the following:
# nct.stats returns mean, var, skew, kurt
a_mean, a_var, a_skew, a_kurt = nct.stats(a_df, a_nc, moments='mvsk')
b_mean, b_var, b_skew, b_kurt = nct.stats(b_df, b_nc, moments='mvsk')
c_mean, c_var, c_skew, c_kurt = nct.stats(c_df, c_nc, moments='mvsk')
drn.set_properties(
{"mean": a_mean, "var": a_var, "skew": a_skew, "kurt": a_kurt},
{"mean": b_mean, "var": b_var, "skew": b_skew, "kurt": b_kurt},
{"mean": c_mean, "var": c_var, "skew": c_skew, "kurt": c_kurt},
)
# We could have alternatively initialised the drn with the following
# drn = distrel.drn(
# gen=gen_a,
# calc=calc_c,
# properties=[
# {"mean": a_mean, "var": a_var},
# {"mean": b_mean, "var": b_var},
# {"mean": c_mean, "var": c_var},
# ],
#)
# This also shows that we don't /have/ to test for all four metrics.
# Currently, only mean, variance, skewness and kurtosis are supported.
Now the drn is ready to train, a tolerance can be provided to allow for early stopping if ALL of the distribution properties are within a percentage tolerance. The tolerance defaults to zero which means no early stopping occurs. Additionally, the maximum number of epochs can be set (this defaults to 100) and then drn will stop training upon reaching this number regardless of performance.
The training starts off with gradient free optimisation before switching to a gradient based optimiser. To disable the gradient free optimiser, set budget to 1.
You can set the number of samples generated by the generator with N.
For reproducible results, you can also set the seed which sets the seed for pytorch, numpy and python's random.
drn.train
returns a boolean, True if the training exited via to early stopping due to
convergence, and False otherwise.
converge = drn.train(
max_epochs=1000,
tol=1e-2,
progress_bar=False,
lr=1e-3
budget=1000,
N=1000,
seed=42,
)
If you want more fine grain controlled over the optimiser - you can do that!
from torch.optim import LBFGS
converge = drn.train(
max_epochs=max_epochs,
tol=tol,
lr=lr,
progress_bar=True,
require_closure=True,
optim=LBFGS,
optim_kwargs={'line_search_fn': 'strong_wolfe'},
)
print(f"Converged?:\t{converge}")
The optim_kwargs
are unpacked when calling the optimiser.
When the optimiser is called internally, set require_closure
to True if you'd like to call:
optim.step(closure)
or require_closure
to False if you'd like to call:
optim.step()
# To view the network waits simply print the drn
print(drn)
Which returns something like:
_l2_ mu
in _l1_/
\_l3_ sigma
out ~ N(mu, sigma)
---
l1:
Linear(in_features=1, out_features=1, bias=True)
Bias: -3.3627309799194336
Weight: 0.027783172205090523
l2:
Linear(in_features=1, out_features=1, bias=True)
Bias: 0.14515426754951477
Weight: 3.0655031204223633
l3:
Linear(in_features=1, out_features=1, bias=True)
Bias: -13.486244201660156
Weight: 2.5820298194885254
# To use the drn as a predictor, call the predict method, a seed is optional but recommend for.
# reproducible results.
new_a = 37
new_b = drn.predict(37, seed=42)
new_c = drn.calc(new_a, new_b) # Wrapps around the calc_c function
Contributing
Please feel free to submit an issue for something you'd like to see fixed or implemented. Alternatively merge requests are always welcome!
Copying
This project is licenced under the GNU GPL version 3 license - to read the license see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file distrel-0.2.1.tar.gz
.
File metadata
- Download URL: distrel-0.2.1.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.6 Linux/6.1.66-gentoo-dist
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f65ef0502ffd32b16d49abffe2510a01f7eb7ebbaf2d5b12838327d17a4d5a5 |
|
MD5 | 6a731e84be0e978098f93fdf9c1b821d |
|
BLAKE2b-256 | 5af8486dd86cd4f5d6fbba63fe2810b6ce802cfb71acab892fd00581d910bbac |
File details
Details for the file distrel-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: distrel-0.2.1-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.6 Linux/6.1.66-gentoo-dist
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7b601cb5012591e9cb642cf050c92168777ff02f890d6f6b54a71cc7cc04427 |
|
MD5 | 9214ab6e8dae9c0049b33410f189469f |
|
BLAKE2b-256 | bafe7f5eeafd82743825ea3bcff35addcc2a9b86ff6f15f9487a319cf4ac3769 |