Monte-Carlo Integration using Neural Networks
Project description
This project implements normalizing flows using TensorFlow for the purpose of multi-dimensional integration and sampling.
Citation
If you use i-flow for your research, please cite:
- "i-flow: High-dimensional Integration and Sampling with Normalizing Flows",
By Christina Gao, Joshua Isaaacson, Claudius Krause.
Mach. Learn.: Sci. Technol. DOI
arXiv:2001.05486.
Table of Contents
[[TOC]]
Importance Sampling
In high dimensional integration, Monte-Carlo (MC) techniques are required in order to obtain convergence of the integral in
a efficient manner (See the discussion of curse of
dimensionality for details). The naïve MC approach would
be to uniformly sample points over the integration domain ($\Omega$) encompassing a volume $V$. The integral $I$ of a function $f(x)$ can then be estimated from
evaluating the function at $N$ random points and taking the mean, i.e.
I = \int_\Omega f(x)\ dx \approx \frac{V}{N}\sum_{i=1}^{N} f(x_i) \equiv V \langle f \rangle_x,
and the uncertainty is determined by the standard deviation of the mean, i.e.
\sigma_f \approx V \sqrt{\frac{\langle f^2\rangle_x - \langle f\rangle_x^2}{N-1}}.
To improve upon the uncertainty of the estimated integral, a technique known as importance sampling was
developed. The goal of importance sampling is the minimize the standard deviation by sampling points from a
distribution that is closer to the function $f(x)$ than the uniform distribution. Given a probability
distribution function $g(x)$
with cumulative distribution function $G(x)$, it is possible to transform the integral using the change of variable
formula, giving:
I = \int_\Omega \frac{f(x)}{g(x)}\ dG(x) \approx \frac{V}{N}\sum_{i=1}^N
\frac{f(\tilde{x}_i)}{g(\tilde{x}_i)} \equiv V \langle f/g\rangle_G,
where $\tilde{x}_i$ is distributed according to $g(x)$. The standard deviation can be calculated in a similar method.
From this, it can easily be seen that if $g(x) \rightarrow f(x)/I$, then the uncertainty from this method would
produce an integral estimate with vanishing uncertainty. However, this would require to know the analytic solution
to the integral!
Therefore, the goal of importance sampling is to find a distribution $g(x)$ as close to $f(x)$ as possible that is
also easy to sample from.
Normalizing Flows
A Normalizing Flow is a bijective transformation from a (usually simple) base distribution to a
(usually more complicated) target distribution, i.e. a change of variables transformation.
It is therefore able to describe arbitrary, high-dimensional probability disctributions of continuous
variables. The transformation is invertible and can therefore be used in two directions: In the one
direction, we can infer the probability of a given sample point. In the other direction, we can sample
random numbers according to the target distribution. Usually, Normalizing Flows are implemented usitilizing
Neural Networks, see [1] for a review. Different architectures exist that are either fast
in inference and slow at sampling or the other way around. Since our application for numerical integration with
importance sampling requires both directions to be fast, we use the architecture based on Coupling Layers that
was proposed in [2].
In a Coupling Layer, the $N$-dimensional is split in to two sets $\vec{x}_a=\{ x_1, \ldots, x_n\}$ and
$\vec{x}_b=\{x_{n+1}, \ldots, x_N\}$, where $1 \leq n < N$. One set remains untransformed, but is used as
input into a neural network. The other set is transformed based on a set of parameters determined from the output of
the neural network. In other words,
x'_a = x_a, \quad\quad\quad\quad\quad\quad\ \ A\in[1,n]\quad\quad\ \\
x'_b = C(x_b; m(\vec{x}_a)),\quad\quad B\in[n+1,N]
with the inverse map given by:
x_a = x'_a,\quad\quad
x_b = C^{-1}(x'_b; m(\vec{x'}_a)) = C^{-1}(x'_b; m(\vec{x}_a)).
This leads to an analytic Jacobian independent on the gradient of the neural network $m(\vec{x}_a)$, and
can be calculated as:
\left\lvert\frac{\partial c(\vec{x})}{\partial\vec{x}}\right\rvert^{-1}
= \left\lvert
\begin{pmatrix}
\vec{1} & 0 \\
\frac{\partial C}{\partial m}\frac{\partial m}{\partial \vec{x}_a} & \frac{\partial C}{\partial \vec{x}_b}
\end{pmatrix}
\right\rvert^{-1}
= \left\lvert \frac{\partial C(\vec{x}_b; m(\vec{x}_a))}{\partial \vec{x}_b} \right\rvert^{-1}.
Below is a graphical representation of single coupling layer, plus a permutation on the indices before being passed into a subsequent layer.
Coupling Layers
Currently, there are three well tested and validated coupling layers implemented in the i-flow package. These are:
Linear Spline
The linear spline is based on a piecewise linear function defining the CDF [3].
Given $N$ dimensions and $K$ bins with a width of $w$ the PDF is defined as
$q_i(t) = \{Q_{ij} | i \in [1,N], (j-1)w \leq t < jw, j \in [1, K]\}$.
The CDF can be calculated by integrating the PDF, giving:
C(x_i^B; Q) = \alpha Q_{ib} + \sum_{k=1}^{b-1} Q_{ik},
where $b$ is the bin in which $x_i^B$ occurs, and $\alpha=\frac{x_i^B-(b-1)w}{w}$.
The Jacobian for this calculation is straight forward to calculate and is:
\left\lvert \frac{\partial C}{\partial x_B} \right\rvert = \prod_i Q_{ib}
Quadratic Spline
The quadratic spline is also based on a piecewise quadratic function defining the
CDF [2]. Given $N$ dimensions, $K$ bins with widths $W_{ik}$, and $K+1$ vertices
with heights $V_{ik}$, the PDF is defined as:
q_i(t) = \begin{cases}
\frac{V_{i2}-V_{i1}}{W_{i1}}t+V_{i1} & t < W_{i1} \\
\frac{V_{i3}-V_{i2}}{W_{i2}}\left(t-W_{i1}\right)+V_{i2} & W_{i1} \leq t < W_{i1}+W_{i2} \\
\quad\vdots & \\
\frac{V_{i(K+1)}-V_{iK}}{W_{iK}}\left(t-\sum_{k=1}^{K-1}W_{ik}\right)+V_{iK} & \sum_{k=1}^{K-1} W_{ik} \leq t < 1 \\
\end{cases}
Integrating the above to obtain the CDF gives:
C(x_i^B; W, V) = \frac{\alpha^2}{2}\left(V_{ib+1}-V_{ib}\right) W_{ib} + V_{ib}W_{ib}\alpha + \sum_{k=1}^{b-1}\frac{V_{ik+1} + V_{ik}}{2} W_{ik},
with $b$ the solution to $\sum_{k=1}^{b-1} W_{ik} \leq x_i^B < \sum_{k=1}^b W_{ik}$,
and $\alpha = \frac{x_i^B-\sum_{k=1}^{b-1}W_{ik}}{W_{ib}}$ is the relative position of $x_i^B$ in bin $b$.
Rational Quadratic Spline
The rational quadratic spline was originally defined in [4].
Given $N$ dimensions, $K+1$ knot points $\{(x^{(k)}, y^{(k)}) \}_{k=0}^K$ that are monotonically increasing, with
$( x^{(0)}, y^{(0)} ) = (0, 0)$ and $( x^{(K)}, y^{(K)} ) = (1, 1)$, and
$K+1$ non-negative derivatives $\{ d^{(k)} \}_{k=0}^K$, the CDF can be calculated using the algorithm from
[5], and is reproduced below.
First define bin widths $w^{(k)} = x^{(k+1)} - x^{(k)}$ and slopes $s^{k}=\frac{y^{(k+1)}-y^{(k)}}{w^{(k)}}$.
Given the point $x$, the bin in which $x$ falls is $k$, and the fractional distance $x$ is between the two
corresponding knots is $\xi=\frac{x-x^{(k)}}{w^{(k)}}$. From this, the CDF can be calculated from:
g(x) = y^{(k)} + \frac{\left(y^{(k+1)} - y^{(k)}\right)\left[s^{(k)}\xi^2+d^{(k)}\xi\left(1-\xi\right)\right]}{s^{(k)}+\left[d^{(k+1)}+d^{(k)}-2s^{(k)}\right]\xi\left(1-\xi\right)}.
Futher documentation
Additional documentation can be found at: doc
Installation
Requirements
In order to run the i-flow program, there are a few additional packages that are required. The packages are:
- TensorFlow (version 2.0 or newer)
- TensorFlow-Probability (version 0.8 or newer)
- Numpy
In order to run the tests provided in the paper and the iflow_test.py file, a few additional pacakges are required. These are:
- Vegas (to compare to the vegas algorithm)
- matplotlib (for plotting the results)
- absl-py (for command line parsing)
Usage
References
- George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan.
Normalizing Flows for Probabilistic Modeling and Inference. arXiv:1912.02762 link - Laurent Dinh, David Krueger, and Yoshua Bengio.
NICE: non-linear independent components estimation. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015. arXiv:1410.8516 link - Thomas Müller, Brian Mcwilliams, Fabrice Rousselle, Markus Gross, and Jan Novák.
Neural Importance Sampling. ACM Trans. Graph., 38(5), October 2019. link - Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios.
Neural Spline Flows. Advances in Neural Information Processing Systems, 7511--7522, 2019. link - J. A. Gregory and R. Delbourgo.
Piecewise Rational Quadratic Interpolation to Monotonic Data.
IMA Journal of Numerical Analysis, 2(2):123-130, 04 1982. link
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iflow-1.0.tar.gz.
File metadata
- Download URL: iflow-1.0.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1cc4927d2a47c33e58dcf61d44f5322be1fb5950fc247b5414bc55ed35d2b80
|
|
| MD5 |
5d104d7297173fe4568b86b1fc99b1fe
|
|
| BLAKE2b-256 |
41b60c9df8999434cd6ad949c28cabfad9196a0b98c2b720945c1ed5950fde10
|
File details
Details for the file iflow-1.0-py3-none-any.whl.
File metadata
- Download URL: iflow-1.0-py3-none-any.whl
- Upload date:
- Size: 43.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca3cc2897baaf8370a88cc826162f9c80738a86e3a86df941ef9049409c1e217
|
|
| MD5 |
dae8c4a04bed4359ff546459952513d4
|
|
| BLAKE2b-256 |
adb6975658a2f69f697f875dc5daa2796acf12fc074f1620ccfa565597da76b2
|