Using neural networks to extract sufficient statistics from data by maximising the Fisher information

## Project description

Optimising a neural network to maximise the Fisher information provides us with a function able to massively compress data without losing information about parameters of interest. This function can then be used for likelihood-free inference.

The module here provides both the routines for fitting a neural network by maximising the Fisher information as well as a few methods for performing likelihood-free inference and approximate Bayesian computation.

Specifically, the neural network takes some data, , and maps it to a compressed summary, , where can have the same dimensionality as that of the parameter space, rather than the data space, potentially without losing any information. To do so we maximise the Fisher information of the summary statistics provided by the neural network, and in doing so, find a functional form of the optimal compression.

To train the neural network a batch of simulations created at a fiducial parameter value for training (and another for validation). These simulations are compressed by the neural network to obtain some statistic , i.e. the output of the neural network. We can use these to calculate the covariance, , of the compressed summaries. The sensitivity to model parameters uses the derivative of the simulation. This can be provided analytically or numercially using created above the fiducial parameter value and created below the fiducial parameter value The simulations are compressed using the network and used to find mean of the summaries

If the derivative of the simulations with respect to the parameters can be calculated analytically (or via autograd, etc.) then that can be used directly using the chain rule since the derivative of the network outputs with respect to the network input can be calculated easily

We then use and to calculate the Fisher information

Since any linear rescaling of the summaries is also a summary, when maximising the Fisher information we set their scale using

where

is a regularisation term whose strength is dictated by

with as a strength and as a rate parameter which can be determined from a closeness condition on the Frobenius norm of the difference between the covariance (and inverse covariance) from the identity matrix.

When using this code please cite

Tom Charnock, Guilhem Lavaux and Benjamin D. Wandelt

Automatic physical inference with information maximizing neural networks.

Physical Review D. 97 083004 (2018)

doi:10.1103/PhysRevD.97.083004

arXiv:1802.03537

The code in the paper can be downloaded as v1 or v1.1 of the code kept on zenodo:

The code can be installed using:

pip install imnn-tf

or:

git clone https://bitbucket.org/tomcharnock/imnn-tf.git cd imnn-tf python3 setup.py install

## Modules

Available are modules for fitting the IMNN in IMNN and for doing likelihood-free inference in LFI. Examples of how to use these modules are available in the examples directory.

### IMNN

The basic call for the IMNN is

```
imnn = IMNN.IMNN(
n_s, n_d, n_params, n_summaries, θ_fid, δθ, input_shape,
fiducial, derivative, validation_fiducial, validation_derivative,
{model}, {optimiser}, {save}, {load}, {weights}, {directory},
{filename}, {at_once}, {map_fn}, {check_shape}, {verbose},
{dtype}, {itype})
```

where

n_s - number of simulations to calculate covariance of fiducial simulations

n_d - number of simulations to calculate mean of the derivative simulations

n_params - number of parameters in the model

n_summaries - number of summaries to compress the data to

θ_fid - fiducial parameter values

δθ - parameter differences for numerical derivative

input_shape - shape of a single simulation

fiducial - a numpy array of fiducial simulations, generative function of individual fiducial simulations, or list of TFRecord filenames

derivative - a numpy array of derivative simulations, generative function of individual derivative simulations, or list of TFRecord filenames

validation_fiducial - a numpy array of fiducial simulations, generative function of individual fiducial simulations, or list of TFRecord filenames

validation_derivative - a numpy array of derivative simulations, generative function of individual derivative simulations, or list of TFRecord filenames

model - a keras-like model (optional if loading)

optimiser - a keras-like optimiser (optional if loading)

save - boolean describing whether to save or not (need directory and filename if save=True)

load - boolean describing whether to load model or not (need directory, `filename and optionally weights if load=True)

weights - string with name of file of saved weights

directory - string with directory to load or save model

filename - string with filename to load or save model

at_once - number of simulations to process with model at once (should be n_s if memory is large enough)

map_fn - function to preprocess data (None if no preprocessing)

check_shape - boolean describing whether to check shape of simulation on initialisation

verbose - boolean to turn on and off descriptive write out

dtype - TensorFlow float type (default tf.float32)

itype - TensorFlow int type (default tf.int32)

#### Fit

```
imnn.fit(
{n_iterations}, {λ}, {ϵ}, {reset}, {patience}, {min_iterations},
{checkpoint}, {tqdm_notebook}, {weight_file})
```

where

n_iterations - number of iterations to run the fitting for (can be None when using patience)

λ - strength of the regularisation

ϵ - distance of covariance (and inverse covariance) from the identity

reset - boolean describing whether to reset weights and start training from scratch

patience - number of iterations of decreasing Fisher information of the validation set before stopping

min_iterations - number of iterations before early stopping turns on

checkpoint - number of iterations between model saving (default turned off)

tqdm_notebook - True if using a Jupyter notebook and False otherwise #TODO - make automatic (might already be implemented)

weight_file - string with filename to save model weights to

Once trained, statistics are saved in a history dictionary attribute imnn.history

"det_F" - determinant of the Fisher information of the training set

"det_C" - determinant of the covariance matrix of the training set

"det_Cinv" - determinant of the inverse covariance matrix of the training set

"dμ_dθ" - derivative of the mean of the training set summaries

"reg" - value of the regularisation

"r" - value of the dynamic strength of the regularisation

"val_det_F" - determinant of the Fisher information of the validation set

"val_det_C" - determinant of the covariance matrix of the validation set

"val_det_Cinv" - determinant of the inverse covariance matrix of the validation set

"val_dμ_dθ" - derivative of the mean of the validation set summaries

#### Plot

```
imnn.plot(
{regulariser}, {known_det_fisher}, {figsize})
```

where

regulariser - boolean describing whether to plot the regularisation history

known_det_fisher - value of the determinant of the target Fisher information if already known

figsize - tuple with the size of the figure if not default

#### Estimate parameters

Gaussian estimates of the parameter values can be obtained from the network by running

`imnn.get_estimate(input_data)`

where `input_data is data input to the network (shape None + input_shape). Note that if you want to make estimates without initialising the IMNN (once trained), the model can be loaded, along with the saved data during fit. For an IMNN saved with directory="model" and filename=model then an estimator can be made using

```
estimator_parameters = np.load("model/model/estimator.npz")
Finv = estimator_parameters["Finv"]
θ_fid = estimator_parameters["θ_fid"]
dμ_dθ = estimator_parameters["dμ_dθ"]
Cinv = estimator_parameters["Cinv"]
μ = estimator_parameters["μ"]
@tf.function:
def estimator(data):
return tf.add(
θ_fid,
tf.einsum(
"ij,jk,kl,ml->mi",
Finv,
dμ_dθ,
Cinv,
model(data) - μ))
```

or

```
def estimator(data):
return θ_fid + np.einsum(
"ij,jk,kl,ml->mi",
Finv,
dμ_dθ,
Cinv,
model(data) - μ)
```

#### Training and validation data format

The data must have the correct shape. For a single simulation with shape input_shape then a fiducial data array must have a shape of

`fiducial.shape = (n_s,) + input_shape`

The derivatives need to have a shape of

`derivative.shape = (n_d, 2, n_params) + input_shape`

where derivative[:, 0, ...] is the lower part of the numerical derivative and derivative[:, 1, ...] is the upper part of the numerical derivative and derivative[:, :, i, ...] labels the i th parameter.

If the data won’t fit in memory then we can load data via a generative function

```
def fiducial_loader(seed):
yield fiducial[seed], seed
def derivative_loader(seed, derivative, parameter):
yield derivative[seed, derivative, parameter] (seed, derivative, parameter)
```

The function yields a single simulation at for each call labelled with the seed index (seed in range 0 to n_s) for the fiducial loader. The derivative loader yields a single simulation at a given seed, given upper or lower derivative and given parameter index (seed in range 0 to n_d, derivative in range 0 to 1, and parameter in range 0 to n_params). In the above functions, fiducial and derivative are some way of grabbing the data - it could be reading from file or from memory, etc. This has quite a bit of overhead and so it would be preferred to save the data as a TFRecord format. Instructions on how to do this for ingestion by the IMNN is available in the examples/TFRecords.ipynb and examples/IMNN - TFRecords.ipynb tutorials.

#### Network model and optimiser

The IMNN is based on keras-like network and optimisers, so an example could be

```
model = tf.keras.Sequential(
[tf.keras.Input(shape=input_shape),
tf.keras.layers.Dense(128),
LeakyReLU(0.01),
tf.keras.layers.Dense(128),
LeakyReLU(0.01),
tf.keras.layers.Dense(n_summaries),
])
opt = tf.keras.optimizers.Adam()
```

Make sure to choose this network sensibly so that it best pulls the information from the data.

## LFI

The LFI module provides a Gaussian approximation to the posterior, a simple approximation Bayesian computation (ABC) implementation and a population Monte Carlo (PMC). These work with any estimator and not just with the IMNN.

### Gaussian approximation of the posterior

The Gaussian approximation takes the inverse Fisher information as the variance of a Gaussian posterior (as implied by the Cramer-Rao bound) whose mean is at the estimate value.

```
GA = LFI.GaussianApproximation(
target_data, prior, Fisher, get_estimate, {labels})
```

where

target_data - as many pieces of data to be inferred (target_data.shape = (None,) + input_shape)

prior - the prior distribution which can be sampled from and whose probability can be evaluated with an event_shape of at least [1] (suggested to use a TensorFlow Probability distribution)

Fisher - Fisher information matrix (imnn.F or otherwise for non-IMNN)

get_estimate - function providing estimate of the n_params model parameters from the data (imnn.get_estimate or otherwise for non-IMNN)

labels - list of strings for labelling plots

#### Plot Fisher information

The inverse Fisher information can plotted using

`GA.plot_Fisher({figsize})`

#### Gaussian approximation to the likelihood and posterior

The Gaussian approximation to the likelihood (prob) and the posterior (and their logarithms) can be obtained using

```
GA.log_prob({grid}, {gridsize})
GA.prob({grid}, {gridsize})
GA.log_posterior({grid}, {gridsize})
GA.posterior({grid}, {gridsize})
```

where

grid - a set of parameters or an array of parameter or a meshgrid of parameter to evaluate the likelihood or posterior at (if None gridsize takes over)

gridsize - a tuple of length n_params with the size of the meshgrid to make #TODO might crash if GA.prior.low=-np.inf for any parameter or GA.prior.high=np.inf for any parameter. This defaults to 20 for every parameter if grid=None and gridsize is not provided

#### Plotting posterior

The posterior can be plotted using

`GA.plot({grid}, {gridsize}, **kwargs)`

where **kwargs are a variety of matplotlib arguments.

### Approximate Bayesian computation (ABC)

The ABC draws parameter values from the prior and makes simulations at these points. These simulations are then summarised and then the distance between these estimates and the estimate of the target data can be calculated. Estimates within some small ϵ-ball around the target estimate are approximately samples from the posterior. Note that the larger the value of ϵ, the worse the approximation to the posterior.

Note that a simulator of the data is needed. The simulator must be a function

```
def simulator(parameters, seed, simulator_args):
return simulation
```

where seed is a random number generator and simulator_args is a dict of arguments. The seed and simulator_args are only for setting up the simulator - the function used in the ABC (and PMC) call must only take an array of parameters and return an array of simulations made at those parameter values. The function can call external codes, submit jobs on a cluster, etc. as long as the simulations are returned in the same order as the passed parameter array.

The ABC can be initialised using

```
ABC = LFI.ApproximateBayesianComputation(
target_data, prior, Fisher, get_estimate, simulator, {labels})
```

where

target_data - as many pieces of data to be inferred (target_data.shape = (None,) + input_shape)

prior - the prior distribution which can be sampled from and whose probability can be evaluated with an event_shape of at least [1] (suggested to use a TensorFlow Probability distribution)

Fisher - Fisher information matrix (imnn.F or otherwise for non-IMNN)

get_estimate - function providing estimate of the n_params model parameters from the data (imnn.get_estimate or otherwise for non-IMNN)

simulator - function taking array of parameter values and returning simulations made at those values

labels - list of strings for labelling plots

#### Obtaining samples

The ABC can be run using

`ABC(draws, {at_once}, {save_sims})`

or

`ABC.ABC(draws, {at_once}, {save_sims}, {PMC}, {update})`

where

draws - the number of simulations to make (or parameter values to make the simulations if PMC=True)

at_once - boolean describing whether to process (and make) all simulations at once or not

save_sims - string with the filename to save the sims (as a .npy) if provided

PMC - boolean describing whether draws is a number of simulations or draws is an array of parameter values to make simulations at

update - boolean describing whether to update the ABC attributes onces the ABC is run or not

Once this is run the parameters, estimates, differences from the estimate and the target and the distance from the target are found as

ABC.parameters

ABC.estimates

ABC.differences

ABC.distances

#### Acception and rejection of samples

ABC only runs the simulations and calculates the estimate distances but doesn’t do the accept and reject step within the ϵ-ball. This is done using

`ABC.accept_reject(ϵ)`

where

ϵ` - a float describing the radius of the ϵ-ball

Once this is run more attributes are filled

ABC.num_accepted - number of accepted samples

ABC.num_rejected - number of rejected samples

ABC.num_draws - total number of samples done

ABC.accepted_parameters

ABC.accepted_differences

ABC.accepted_estimates

ABC.accepted_distances

ABC.rejected_parameters

ABC.rejected_differences

ABC.rejected_estimates

ABC.rejected_distances

#### Automatic rejection sampler

To get a certain number of draws within a chosen ϵ-ball one can run

```
ABC.get_min_accepted(
ϵ, accepted, {min_draws}, {at_once}, {save_sims}, {tqdm_notebook})
```

where

ϵ - a float describing the radius of the ϵ-ball

accepted - the number of samples to be accepted within the ϵ-ball

min_draws - how many simulations to do at a time iteratively until enough simulations are accepted

at_once - boolean describing whether to process (and make) all simulations at once or not

save_sims - string with the filename to save the sims (as a .npy) if provided

tqdm_notebook - True if using a Jupyter notebook and False otherwise #TODO - make automatic (might already be implemented)

#### Histogrammed posterior

The posterior is approximated by histogramming the accepted samples from the ABC (and acception/rejection) and can be calculated using

```
ABC.posterior(
{bins}, {ranges}, {ϵ}, {draws}, {accepted},
{at_once}, {save_sims}, {tqdm_notebook})
```

where

bins - number of bins in the histogram defining the posterior

ranges - minimum and maximum values for each parameter in the histogram

Optionally any of the parameters for ABC.ABC(...), ABC.accept_reject(...), and/or ABC.get_min_accepted(...) can be passed to ABC.posterior(...) to run the ABC when calling posterior rather than calling the sampling step first.

#### Plot plosterior

The posterior can be plotted using

```
ABC.plot(
{smoothing}, {bins}, {ranges}, {ϵ}, {draws}, {accepted},
{at_once}, {save_sims}, {tqdm_notebook}, **kwargs)
```

where

smoothing - the pixel range of a Gaussian smoothing of the histogram for plotting (smoothing causes inflation of the posterior)

Optionally any of the parameters for ABC.ABC(...), ABC.accept_reject(...), and/or ABC.get_min_accepted(...) can be passed to ABC.plot(...) to run the ABC when making the plot rather than calling the sampling step first. matplotlib parameters can also be passed for the plotting routine.

#### Plot samples

The samples can also be plotted using

```
ABC.scatter_plot(
{axes}, {rejected}, {ϵ}, {draws}, {accepted},
{at_once}, {save_sims}, {tqdm_notebook}, **kwargs)
```

where

axes - either "parameter_estimate", "parameter_parameter", or "estimate_estimate" for plotting the estimates against the parameters, or the parameters against the parameters or the estimates against the estimates (the last two are good for diagnostics such as the completeness of the sampling from the prior and the shape and correlation of the estimation function)

rejected - a number between 0 and 1 describing the fraction of the rejected samples to plot (there are often orders of magnitude more samples rejected and so it makes sense to plot fewer, if they are to be plotted at all)

Optionally any of the parameters for ABC.ABC(...), ABC.accept_reject(...), and/or ABC.get_min_accepted(...) can be passed to ABC.scatter_plot(...) to run the ABC when making the plot rather than calling the sampling step first. matplotlib parameters can also be passed for the plotting routine.

#### Running with saved simulations

If simulations have already been run and we want to perform a simple ABC on them then we can set the simulator to return the saved simulations, saved_simulations -> (?) + input_shape, and pass the corresponding saved parameters, saved_parameters -> (?, n_params)`, used to make the simulation.

```
simulator = lambda _ : saved_simulations
ABC = LFI.ApproximateBayesianComputation(
target_data, prior, Fisher, get_estimate, simulator, {labels})
ABC(draws=saved_parameters, predrawn=True, save_sims=None, {at_once})
ABC.accept_reject(ϵ)
```

### Population Monte Carlo (PMC)

Whilst we can obtain approximate posteriors using ABC, the rejection rate is very high because we sample always from the prior. Population Monte Carlo (PMC) uses statistics of the population of samples to propose new parameter values, so each new simulation is more likely to be accepted. This prevents us needing to define an ϵ parameter to define the acceptance distance. Instead we start with a population from the prior and iteratively move samples inwards. Once it becomes difficult to move the population any more, i.e. the number of attempts to accept a parameter becomes very large, then the distribution is seen to be a stable approximation to the posterior.

The whole module works very similarly to ABC with a few changes in arguments.

```
PMC = LFI.PopulationMonteCarlo(
target_data, prior, Fisher, get_estimate, simulator, {labels})
```

where

target_data - as many pieces of data to be inferred (target_data.shape = (None,) + input_shape)

prior - the prior distribution which can be sampled from and whose probability can be evaluated with an event_shape of at least [1] (suggested to use a TensorFlow Probability distribution)

Fisher - Fisher information matrix (imnn.F or otherwise for non-IMNN)

get_estimate - function providing estimate of the n_params model parameters from the data (imnn.get_estimate or otherwise for non-IMNN)

simulator - function taking array of parameter values and returning simulations made at those values

labels - list of strings for labelling plots

#### Obtaining accepted samples

The PMC can be run by calling

```
PMC(draws, initial_draws, criterion, {percentile},
{at_once}, {save_sims}, {tqdm_notebook})
```

or

```
PMC.PMC(
draws, initial_draws, criterion, {percentile},
{at_once}, {save_sims}, {tqdm_notebook})
```

where

draws - number of samples from the posterior

initial_draws - number of samples from the prior to start the PMC (must be equal to or greater than the number of draws from the posterior

criterion - the stopping condition, the fraction of times samples are accepted in any one iteration of the PMC (when this is small then many samples are not accepted into the population, suggesting a stationary distribution)

percentile - the percentage of points which are considered the in the main sample (making this small moves more samples at once, but with reduced statistics from the population, default set to 75%, it takes longer to run (but may be cheaper in number of simulations) if set to a high value or None)

at_once - boolean describing whether to process (and make) all simulations at once or not

save_sims - string with the filename to save the sims (as a .npy) if provided

tqdm_notebook - True if using a Jupyter notebook and False otherwise #TODO - make automatic (might already be implemented)

#### Histogrammed posterior

The posterior is approximated by histogramming the accepted samples from the PMC and can be calculated using

```
PMC.posterior(
{bins}, {ranges}, {draws}, {initial_draws}, {criterion}, {percentile},
{at_once}, {save_sims}, {tqdm_notebook})
```

where

bins - number of bins in the histogram defining the posterior

ranges - minimum and maximum values for each parameter in the histogram

Optionally any of the parameters for PMC.PMC(...) can be passed to PMC.posterior(...)` to run the PMC when calling posterior rather than calling the sampling step first.

#### Plot posterior

The posterior can be plotted using

```
PMC.plot(
{smoothing}, {bins}, {ranges}, {draws}, {initial_draws}, {criterion},
{percentile}, {at_once}, {save_sims}, {tqdm_notebook}, **kwargs)
```

where

smoothing - the pixel range of a Gaussian smoothing of the histogram for plotting (smoothing causes inflation of the posterior)

Optionally any of the parameters for PMC.PMC(...) can be passed to PMC.plot(...) to run the PMC when making the plot rather than calling the sampling step first. matplotlib parameters can also be passed for the plotting routine.

#### Plot samples

The samples can also be plotted using

```
PMC.scatter_plot(
{axes}, {draws}, {initial_draws}, {criterion}, {percentile},
{at_once}, {save_sims}, {tqdm_notebook}, **kwargs)
```

where

axes - either "parameter_estimate", "parameter_parameter", or "estimate_estimate" for plotting the estimates against the parameters, or the parameters against the parameters or the estimates against the estimates (the last two are good for diagnostics such as the completeness of the sampling from the prior and the shape and correlation of the estimation function)

Optionally any of the parameters for PMC.PMC(...) can be passed to PMC.scatter_plot(...) to run the PMC when making the plot rather than calling the sampling step first. matplotlib parameters can also be passed for the plotting routine.

## TODO

The module is under constant development, and progress can be checked in the dev branch. Current additions to the IMNN include

Put back summary support - Previous versions of the IMNN had the ability to pass arbitrary summaries along with network summaries. This is useful because it can be a suggestion of how much information is gained over other summarising functions (such as the two point statistics, etc.) - Need to accept array, generative function and TFRecords with summaries and split covariance between summaries and network outputs for regularisation

JAX implementation of all routines - This is under private development currently

Docstrings written for LFI

Write unit tests

## Project details

## Release history Release notifications | RSS feed

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.