Skip to main content

Monte-Carlo Integration using Neural Networks

Project description

i-flow-logo

pipeline status coverage report

This project implements normalizing flows using TensorFlow for the purpose of multi-dimensional integration and sampling.

Citation

If you use i-flow for your research, please cite:

  • "i-flow: High-dimensional Integration and Sampling with Normalizing Flows",
    By Christina Gao, Joshua Isaacson, Claudius Krause.
    Mach. Learn.: Sci. Technol. DOI
    arXiv:2001.05486.

Table of Contents

[[TOC]]

Importance Sampling

In high dimensional integration, Monte-Carlo (MC) techniques are required in order to obtain convergence of the integral in a efficient manner (See the discussion of curse of dimensionality for details). The naïve MC approach would be to uniformly sample points over the integration domain ($\Omega$) encompassing a volume $V$. The integral $I$ of a function $f(x)$ can then be estimated from evaluating the function at $N$ random points and taking the mean, i.e.

I = \int_\Omega f(x)\ dx \approx \frac{V}{N}\sum_{i=1}^{N} f(x_i) \equiv V \langle f \rangle_x,

and the uncertainty is determined by the standard deviation of the mean, i.e.

\sigma_f \approx V \sqrt{\frac{\langle f^2\rangle_x - \langle f\rangle_x^2}{N-1}}.

To improve upon the uncertainty of the estimated integral, a technique known as importance sampling was developed. The goal of importance sampling is the minimize the standard deviation by sampling points from a distribution that is closer to the function $f(x)$ than the uniform distribution. Given a probability distribution function $g(x)$ with cumulative distribution function $G(x)$, it is possible to transform the integral using the change of variable formula, giving:

I = \int_\Omega \frac{f(x)}{g(x)}\ dG(x) \approx \frac{V}{N}\sum_{i=1}^N
\frac{f(\tilde{x}_i)}{g(\tilde{x}_i)} \equiv V \langle f/g\rangle_G,

where $\tilde{x}_i$ is distributed according to $g(x)$. The standard deviation can be calculated in a similar method. From this, it can easily be seen that if $g(x) \rightarrow f(x)/I$, then the uncertainty from this method would produce an integral estimate with vanishing uncertainty. However, this would require to know the analytic solution to the integral!

Therefore, the goal of importance sampling is to find a distribution $g(x)$ as close to $f(x)$ as possible that is also easy to sample from.

Normalizing Flows

A Normalizing Flow is a bijective transformation from a (usually simple) base distribution to a (usually more complicated) target distribution, i.e. a change of variables transformation. It is therefore able to describe arbitrary, high-dimensional probability disctributions of continuous variables. The transformation is invertible and can therefore be used in two directions: In the one direction, we can infer the probability of a given sample point. In the other direction, we can sample random numbers according to the target distribution. Usually, Normalizing Flows are implemented usitilizing Neural Networks, see [1] for a review. Different architectures exist that are either fast in inference and slow at sampling or the other way around. Since our application for numerical integration with importance sampling requires both directions to be fast, we use the architecture based on Coupling Layers that was proposed in [2].

In a Coupling Layer, the $N$-dimensional is split in to two sets $\vec{x}_a=\{ x_1, \ldots, x_n\}$ and $\vec{x}_b=\{x_{n+1}, \ldots, x_N\}$, where $1 \leq n < N$. One set remains untransformed, but is used as input into a neural network. The other set is transformed based on a set of parameters determined from the output of the neural network. In other words,

x'_a = x_a, \quad\quad\quad\quad\quad\quad\ \  A\in[1,n]\quad\quad\  \\
x'_b = C(x_b; m(\vec{x}_a)),\quad\quad B\in[n+1,N]

with the inverse map given by:

x_a = x'_a,\quad\quad
x_b = C^{-1}(x'_b; m(\vec{x'}_a)) = C^{-1}(x'_b; m(\vec{x}_a)).

This leads to an analytic Jacobian independent on the gradient of the neural network $m(\vec{x}_a)$, and can be calculated as:

\left\lvert\frac{\partial c(\vec{x})}{\partial\vec{x}}\right\rvert^{-1}
= \left\lvert
\begin{pmatrix}
\vec{1} & 0 \\
\frac{\partial C}{\partial m}\frac{\partial m}{\partial \vec{x}_a} & \frac{\partial C}{\partial \vec{x}_b}
\end{pmatrix}
\right\rvert^{-1}
= \left\lvert \frac{\partial C(\vec{x}_b; m(\vec{x}_a))}{\partial \vec{x}_b} \right\rvert^{-1}.

Below is a graphical representation of single coupling layer, plus a permutation on the indices before being passed into a subsequent layer.

Coupling Layers

Currently, there are three well tested and validated coupling layers implemented in the i-flow package. These are:

Linear Spline

The linear spline is based on a piecewise linear function defining the CDF [3]. Given $N$ dimensions and $K$ bins with a width of $w$ the PDF is defined as $q_i(t) = \{Q_{ij} | i \in [1,N], (j-1)w \leq t < jw, j \in [1, K]\}$. The CDF can be calculated by integrating the PDF, giving:

C(x_i^B; Q) = \alpha Q_{ib} + \sum_{k=1}^{b-1} Q_{ik},

where $b$ is the bin in which $x_i^B$ occurs, and $\alpha=\frac{x_i^B-(b-1)w}{w}$. The Jacobian for this calculation is straight forward to calculate and is:

\left\lvert \frac{\partial C}{\partial x_B} \right\rvert = \prod_i Q_{ib}

Quadratic Spline

The quadratic spline is also based on a piecewise quadratic function defining the CDF [2]. Given $N$ dimensions, $K$ bins with widths $W_{ik}$, and $K+1$ vertices with heights $V_{ik}$, the PDF is defined as:

q_i(t) = \begin{cases}
      \frac{V_{i2}-V_{i1}}{W_{i1}}t+V_{i1} & t < W_{i1} \\
      \frac{V_{i3}-V_{i2}}{W_{i2}}\left(t-W_{i1}\right)+V_{i2} & W_{i1} \leq t < W_{i1}+W_{i2} \\
      \quad\vdots & \\
      \frac{V_{i(K+1)}-V_{iK}}{W_{iK}}\left(t-\sum_{k=1}^{K-1}W_{ik}\right)+V_{iK} & \sum_{k=1}^{K-1} W_{ik} \leq t < 1 \\
      \end{cases}

Integrating the above to obtain the CDF gives:

C(x_i^B; W, V) = \frac{\alpha^2}{2}\left(V_{ib+1}-V_{ib}\right) W_{ib} + V_{ib}W_{ib}\alpha + \sum_{k=1}^{b-1}\frac{V_{ik+1} + V_{ik}}{2} W_{ik},

with $b$ the solution to $\sum_{k=1}^{b-1} W_{ik} \leq x_i^B < \sum_{k=1}^b W_{ik}$, and $\alpha = \frac{x_i^B-\sum_{k=1}^{b-1}W_{ik}}{W_{ib}}$ is the relative position of $x_i^B$ in bin $b$.

Rational Quadratic Spline

The rational quadratic spline was originally defined in [4]. Given $N$ dimensions, $K+1$ knot points $\{(x^{(k)}, y^{(k)}) \}_{k=0}^K$ that are monotonically increasing, with $( x^{(0)}, y^{(0)} ) = (0, 0)$ and $( x^{(K)}, y^{(K)} ) = (1, 1)$, and $K+1$ non-negative derivatives $\{ d^{(k)} \}_{k=0}^K$, the CDF can be calculated using the algorithm from [5], and is reproduced below.

First define bin widths $w^{(k)} = x^{(k+1)} - x^{(k)}$ and slopes $s^{k}=\frac{y^{(k+1)}-y^{(k)}}{w^{(k)}}$. Given the point $x$, the bin in which $x$ falls is $k$, and the fractional distance $x$ is between the two corresponding knots is $\xi=\frac{x-x^{(k)}}{w^{(k)}}$. From this, the CDF can be calculated from:

g(x) = y^{(k)} + \frac{\left(y^{(k+1)} - y^{(k)}\right)\left[s^{(k)}\xi^2+d^{(k)}\xi\left(1-\xi\right)\right]}{s^{(k)}+\left[d^{(k+1)}+d^{(k)}-2s^{(k)}\right]\xi\left(1-\xi\right)}.

Futher documentation

Additional documentation can be found at: doc

Installation

The i-flow program can be installed using pip.

pip install iflow

Requirements

In order to run the i-flow program, there are a few additional packages that are required. The packages are:

  1. TensorFlow (version 2.0 or newer)
  2. TensorFlow-Probability (version 0.8 or newer)
  3. Numpy

In order to run the tests provided in the paper and the iflow_test.py file, a few additional packages are required. These are:

  1. Vegas (to compare to the vegas algorithm)
  2. matplotlib (for plotting the results)
  3. absl-py (for command line parsing)

Usage

Running the script iflow_test.py will produce the results of our paper. It also illustrates how to use i-flow.

Line Explanation
0 -- 20 Import required packages, see Requirements.
21 -- 24 Define shortcuts and set precision to float64.
25 -- 44 Define command line flags to control program.
45 -- 325 The class TestFunctions contains the functions to be integrated. This should be replaced by the desired function for integration.
326 -- 353 build() defines the Neural Network that is used in each Coupling Layer. This can be adapted to the problem at hand, for example by adding more layers or making the layers wider.
354 -- 379 binary_masks() creates the masking (i.e. which dimensions are transformed and which are passed through a coupling layer) based on the discussion of section III A of our paper.
380 -- 412 build_iflow() builds the i-flow Integrator object. The most important parts are explained below.
391 Defines which masking is used. Can be replaced by a custom list.
394 -- 397 Defines the main bijector. PiecewiseRationalQuadratic can be replaced by any other coupling function defined in couplings.py.
408 Defines which optimizer is used.
410 Defines which loss function is used. 'exponential' can be replaced by any other loss function defined in divergences.py.
414 -- 442 train_iflow() trains i-flow for a given number of epochs.
443 -- 472 train_iflow_target() trains i-flow until a target precision is reached.
473 -- 497 sample_iflow() samples random numbers according to the trained distribution. Not used in the current setup.
498 -- 528 Defines helper functions used in the presentation of the results.
529 -- 925 Runs the setup defined with the command line flags.

Running

python iflow_test.py -f=Gauss -d=2 -p=5000 -tp=1e-4 -t

gives the result of our first test case, apart from small fluctuations due to a different random seed.

References

  1. George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan.
    Normalizing Flows for Probabilistic Modeling and Inference. arXiv:1912.02762 link
  2. Laurent Dinh, David Krueger, and Yoshua Bengio.
    NICE: non-linear independent components estimation. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015. arXiv:1410.8516 link
  3. Thomas Müller, Brian Mcwilliams, Fabrice Rousselle, Markus Gross, and Jan Novák.
    Neural Importance Sampling. ACM Trans. Graph., 38(5), October 2019. link
  4. Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios.
    Neural Spline Flows. Advances in Neural Information Processing Systems, 7511--7522, 2019. link
  5. J. A. Gregory and R. Delbourgo.
    Piecewise Rational Quadratic Interpolation to Monotonic Data.
    IMA Journal of Numerical Analysis, 2(2):123-130, 04 1982. link

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iflow-1.0.1-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file iflow-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: iflow-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.6

File hashes

Hashes for iflow-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 204283f95329a7ccf73ad731158f67d2ba8f8e4533bc9d09c2e505aa9e93b2af
MD5 4586a2de09ffb40c3417623c3363e805
BLAKE2b-256 ed7905653a5a8ed60865a8e1e20b65d30964f5fe1454969fe9d54e4f94de9680

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page