Skip to main content

Monte-Carlo Integration using Neural Networks

Project description

i-flow-logo

pipeline status coverage report

This project implements normalizing flows using TensorFlow for the purpose of multi-dimensional integration and sampling.

Citation

If you use i-flow for your research, please cite:

  • "i-flow: High-dimensional Integration and Sampling with Normalizing Flows",
    By Christina Gao, Joshua Isaaacson, Claudius Krause.
    Mach. Learn.: Sci. Technol. DOI
    arXiv:2001.05486.

Table of Contents

[[TOC]]

Importance Sampling

In high dimensional integration, Monte-Carlo (MC) techniques are required in order to obtain convergence of the integral in a efficient manner (See the discussion of curse of dimensionality for details). The naïve MC approach would be to uniformly sample points over the integration domain ($\Omega$) encompassing a volume $V$. The integral $I$ of a function $f(x)$ can then be estimated from evaluating the function at $N$ random points and taking the mean, i.e.

I = \int_\Omega f(x)\ dx \approx \frac{V}{N}\sum_{i=1}^{N} f(x_i) \equiv V \langle f \rangle_x,

and the uncertainty is determined by the standard deviation of the mean, i.e.

\sigma_f \approx V \sqrt{\frac{\langle f^2\rangle_x - \langle f\rangle_x^2}{N-1}}.

To improve upon the uncertainty of the estimated integral, a technique known as importance sampling was developed. The goal of importance sampling is the minimize the standard deviation by sampling points from a distribution that is closer to the function $f(x)$ than the uniform distribution. Given a probability distribution function $g(x)$ with cumulative distribution function $G(x)$, it is possible to transform the integral using the change of variable formula, giving:

I = \int_\Omega \frac{f(x)}{g(x)}\ dG(x) \approx \frac{V}{N}\sum_{i=1}^N
\frac{f(\tilde{x}_i)}{g(\tilde{x}_i)} \equiv V \langle f/g\rangle_G,

where $\tilde{x}_i$ is distributed according to $g(x)$. The standard deviation can be calculated in a similar method. From this, it can easily be seen that if $g(x) \rightarrow f(x)/I$, then the uncertainty from this method would produce an integral estimate with vanishing uncertainty. However, this would require to know the analytic solution to the integral!

Therefore, the goal of importance sampling is to find a distribution $g(x)$ as close to $f(x)$ as possible that is also easy to sample from.

Normalizing Flows

A Normalizing Flow is a bijective transformation from a (usually simple) base distribution to a (usually more complicated) target distribution, i.e. a change of variables transformation. It is therefore able to describe arbitrary, high-dimensional probability disctributions of continuous variables. The transformation is invertible and can therefore be used in two directions: In the one direction, we can infer the probability of a given sample point. In the other direction, we can sample random numbers according to the target distribution. Usually, Normalizing Flows are implemented usitilizing Neural Networks, see [1] for a review. Different architectures exist that are either fast in inference and slow at sampling or the other way around. Since our application for numerical integration with importance sampling requires both directions to be fast, we use the architecture based on Coupling Layers that was proposed in [2].

In a Coupling Layer, the $N$-dimensional is split in to two sets $\vec{x}_a=\{ x_1, \ldots, x_n\}$ and $\vec{x}_b=\{x_{n+1}, \ldots, x_N\}$, where $1 \leq n < N$. One set remains untransformed, but is used as input into a neural network. The other set is transformed based on a set of parameters determined from the output of the neural network. In other words,

x'_a = x_a, \quad\quad\quad\quad\quad\quad\ \  A\in[1,n]\quad\quad\  \\
x'_b = C(x_b; m(\vec{x}_a)),\quad\quad B\in[n+1,N]

with the inverse map given by:

x_a = x'_a,\quad\quad
x_b = C^{-1}(x'_b; m(\vec{x'}_a)) = C^{-1}(x'_b; m(\vec{x}_a)).

This leads to an analytic Jacobian independent on the gradient of the neural network $m(\vec{x}_a)$, and can be calculated as:

\left\lvert\frac{\partial c(\vec{x})}{\partial\vec{x}}\right\rvert^{-1}
= \left\lvert
\begin{pmatrix}
\vec{1} & 0 \\
\frac{\partial C}{\partial m}\frac{\partial m}{\partial \vec{x}_a} & \frac{\partial C}{\partial \vec{x}_b}
\end{pmatrix}
\right\rvert^{-1}
= \left\lvert \frac{\partial C(\vec{x}_b; m(\vec{x}_a))}{\partial \vec{x}_b} \right\rvert^{-1}.

Below is a graphical representation of single coupling layer, plus a permutation on the indices before being passed into a subsequent layer.

Coupling Layers

Currently, there are three well tested and validated coupling layers implemented in the i-flow package. These are:

Linear Spline

The linear spline is based on a piecewise linear function defining the CDF [3]. Given $N$ dimensions and $K$ bins with a width of $w$ the PDF is defined as $q_i(t) = \{Q_{ij} | i \in [1,N], (j-1)w \leq t < jw, j \in [1, K]\}$. The CDF can be calculated by integrating the PDF, giving:

C(x_i^B; Q) = \alpha Q_{ib} + \sum_{k=1}^{b-1} Q_{ik},

where $b$ is the bin in which $x_i^B$ occurs, and $\alpha=\frac{x_i^B-(b-1)w}{w}$. The Jacobian for this calculation is straight forward to calculate and is:

\left\lvert \frac{\partial C}{\partial x_B} \right\rvert = \prod_i Q_{ib}

Quadratic Spline

The quadratic spline is also based on a piecewise quadratic function defining the CDF [2]. Given $N$ dimensions, $K$ bins with widths $W_{ik}$, and $K+1$ vertices with heights $V_{ik}$, the PDF is defined as:

q_i(t) = \begin{cases}
      \frac{V_{i2}-V_{i1}}{W_{i1}}t+V_{i1} & t < W_{i1} \\
      \frac{V_{i3}-V_{i2}}{W_{i2}}\left(t-W_{i1}\right)+V_{i2} & W_{i1} \leq t < W_{i1}+W_{i2} \\
      \quad\vdots & \\
      \frac{V_{i(K+1)}-V_{iK}}{W_{iK}}\left(t-\sum_{k=1}^{K-1}W_{ik}\right)+V_{iK} & \sum_{k=1}^{K-1} W_{ik} \leq t < 1 \\
      \end{cases}

Integrating the above to obtain the CDF gives:

C(x_i^B; W, V) = \frac{\alpha^2}{2}\left(V_{ib+1}-V_{ib}\right) W_{ib} + V_{ib}W_{ib}\alpha + \sum_{k=1}^{b-1}\frac{V_{ik+1} + V_{ik}}{2} W_{ik},

with $b$ the solution to $\sum_{k=1}^{b-1} W_{ik} \leq x_i^B < \sum_{k=1}^b W_{ik}$, and $\alpha = \frac{x_i^B-\sum_{k=1}^{b-1}W_{ik}}{W_{ib}}$ is the relative position of $x_i^B$ in bin $b$.

Rational Quadratic Spline

The rational quadratic spline was originally defined in [4]. Given $N$ dimensions, $K+1$ knot points $\{(x^{(k)}, y^{(k)}) \}_{k=0}^K$ that are monotonically increasing, with $( x^{(0)}, y^{(0)} ) = (0, 0)$ and $( x^{(K)}, y^{(K)} ) = (1, 1)$, and $K+1$ non-negative derivatives $\{ d^{(k)} \}_{k=0}^K$, the CDF can be calculated using the algorithm from [5], and is reproduced below.

First define bin widths $w^{(k)} = x^{(k+1)} - x^{(k)}$ and slopes $s^{k}=\frac{y^{(k+1)}-y^{(k)}}{w^{(k)}}$. Given the point $x$, the bin in which $x$ falls is $k$, and the fractional distance $x$ is between the two corresponding knots is $\xi=\frac{x-x^{(k)}}{w^{(k)}}$. From this, the CDF can be calculated from:

g(x) = y^{(k)} + \frac{\left(y^{(k+1)} - y^{(k)}\right)\left[s^{(k)}\xi^2+d^{(k)}\xi\left(1-\xi\right)\right]}{s^{(k)}+\left[d^{(k+1)}+d^{(k)}-2s^{(k)}\right]\xi\left(1-\xi\right)}.

Futher documentation

Additional documentation can be found at: doc

Installation

Requirements

In order to run the i-flow program, there are a few additional packages that are required. The packages are:

  1. TensorFlow (version 2.0 or newer)
  2. TensorFlow-Probability (version 0.8 or newer)
  3. Numpy

In order to run the tests provided in the paper and the iflow_test.py file, a few additional pacakges are required. These are:

  1. Vegas (to compare to the vegas algorithm)
  2. matplotlib (for plotting the results)
  3. absl-py (for command line parsing)

Usage

References

  1. George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan.
    Normalizing Flows for Probabilistic Modeling and Inference. arXiv:1912.02762 link
  2. Laurent Dinh, David Krueger, and Yoshua Bengio.
    NICE: non-linear independent components estimation. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015. arXiv:1410.8516 link
  3. Thomas Müller, Brian Mcwilliams, Fabrice Rousselle, Markus Gross, and Jan Novák.
    Neural Importance Sampling. ACM Trans. Graph., 38(5), October 2019. link
  4. Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios.
    Neural Spline Flows. Advances in Neural Information Processing Systems, 7511--7522, 2019. link
  5. J. A. Gregory and R. Delbourgo.
    Piecewise Rational Quadratic Interpolation to Monotonic Data.
    IMA Journal of Numerical Analysis, 2(2):123-130, 04 1982. link

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iflow-1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iflow-1.0-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file iflow-1.0.tar.gz.

File metadata

  • Download URL: iflow-1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.6

File hashes

Hashes for iflow-1.0.tar.gz
Algorithm Hash digest
SHA256 e1cc4927d2a47c33e58dcf61d44f5322be1fb5950fc247b5414bc55ed35d2b80
MD5 5d104d7297173fe4568b86b1fc99b1fe
BLAKE2b-256 41b60c9df8999434cd6ad949c28cabfad9196a0b98c2b720945c1ed5950fde10

See more details on using hashes here.

File details

Details for the file iflow-1.0-py3-none-any.whl.

File metadata

  • Download URL: iflow-1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.6

File hashes

Hashes for iflow-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca3cc2897baaf8370a88cc826162f9c80738a86e3a86df941ef9049409c1e217
MD5 dae8c4a04bed4359ff546459952513d4
BLAKE2b-256 adb6975658a2f69f697f875dc5daa2796acf12fc074f1620ccfa565597da76b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page