Skip to main content

Fast parallel-iterative cosmological parameter estimation with Bayesian nonparametrics

Project description

Gaussbock

Fast parallel-iterative cosmological parameter estimation with Bayesian nonparametrics

Gaussbock is a general-purpose tool for parameter estimation with computationally expensive likelihood calculations, developed for high-dimensional cosmological parameter estimation for contemporary missions like the Dark Energy Survey (DES), and with upcoming surveys like the Large Synoptic Sky Telescope (LSST) and ESA's Euclid mission in mind. Current efforts in cosmological parameter estimation often suffer from both the computational costs of approximating distributions in high-dimensional parameter spaces and the wide-spread need for model tuning. Specifically, calculating the likelihoods of parameter proposals through anisotropic simulations of the universe imposes high computational costs, which leads to excessive time requirements per experiment in an era of cheap parallel computing resources.

Making use of parallelization and an iterative approach to Bayesian nonparametrics, the provided method starts with a set of data points from a rough distribution guess for the true posterior. In order to obtain such a sample, the code either uses an affine-invariant MCMC ensemble as intoduced by Goodman and Weare (2004) via emcee or an initial set of data points provided by the user. After that, a variational Bayesian non-parametric Gaussian mixture model (GMM), which makes use of an infinite Dirichlet process mixtures approximated with a stick-breaking representation (see Blei and Jordan, 2006), is fitted to the data points and aconsistent number of data points is sampled from the model.

These data points are then re-sampled via probabilities derived from truncated importance sampling, an extension of importance sampling less sensitive to proposal distributions to avoid dominating high-posterior samples and introduced by Ionides (2008), for which the computation is spread over the provided cores as an embarrassingly parallel problem. The model fitting and importance sampling steps are then iteratively repeated, shifting towards an ever-narrower fit around the true posterior distribution, and leading to a significant speed-up in comparison to other established cosmological parameter estimation approaches. At the end, Gaussbock returns a user-specified number of samples and, if the user wishes, the importance weights and the final model itself, allowing for the sampling of an unlimited number of data points and to investigate the importance probabilities to ensure a reasonable distribution of the latter.

If you use or adapt Gaussbock, please cite the corresponding paper (arXiv:1905.09800).

Installation

Gaussbock can be installed via PyPI, with a single command in the terminal:

pip install gaussbock

Alternatively, the file gaussbock.py can be downloaded from the folder gaussbock in this repository and used locally by placing the file into the working directory for a given project. An installation via the terminal is, however, highly recommended, as the installation process will check for the package requirements and automatically update or install any missing dependencies, thus sparing the user the effort of troubleshooting and installing them themselves.

Quickstart guide

Only three inputs are required by Gaussbock: The list of parameter ranges, each as a tuple with the lower and upper limit for each parameter (parameter_ranges), the handle for a posterior function that is to be used to evaluate samples (posterior_evaluation), and the required number of posterior samples in the output (output_samples).

Gaussbock offers a variety of optional inputs. An affine-invariate MCMC ensemble to obtain an initial approximation of the posterior can be used, or users provide their own initial sample (initial_samples). This takes the form ['automatic', int, int] for the MCMC ensemble, with the integer referring to the number of walker and steps per walker, respectively, or ['custom', array-like], with the array-like object providing the initial samples in the parameter space. The maximum number of Gaussbock iterations (gaussbock_iterations) can be set, as can the number of samples that are drawn from the current posterior approximation before each truncated importance sampling step (mixture_samples) and the maximum number of expectation-maximization (EM) iterations to fit the variational Bayesian GMM (em_iterations).

In addition, the start and end for a shrinking convergence threshold for the posterior fitting in the form [float, float] can be provided (tolerance_range), as can the maximum number of Gaussians to be fitted to samples in each iteration (model_components) and the tpe of covariance parameters from the set {'full', 'tied', 'diag', spherical'} used for the fitting process (model_covariance). Another optional input is the method used to initialize the model's weights, means and covariances as either 'kmeans' or 'random' as possible values (parameter_init). The armount of information the model fitting should provide during runtime can be set as 0, 1 or 2 (model_verbosity).

In order to make use of parallelization, the user can either choose to use MPI pools and set the corresponding input to True (mpi_parallelization), for example for running on supercomputers or local clusters, or specify a number of processes for multi-core parallelization (processes). Another boolean input is the choice whether both importance weights and the final model should be returned (weights_and_model). Since truncated importance sampling is used, the truncation value for importance probability reweighting can be chosen as a float from [1.0, 3.0]. Lower values lead to a more general fitting with strong truncation, whereas smaller values result in a higher level of retained dominant data points. This input should only be customized if the approximation is problematic and can't be resolved via other inputs.

The model used for the fitting process can be selected, with 'kde' for kernel density estimation (KDE) being the default for problems in less than three dimensions, and 'gmm' being the default otherwise to use the variational Bayesian GMM suitable for higher-dimensional problems. Lastly, if KDE is used, the kernel bandwidth that should be used can be specified (kde_bandwidth). The required and optional inputs, together with their default values and with optional inputs marked with an asterisk, are listed below, with D denoting the dimensionality of the parameter estimation problem, or the number of parameters.



Variables Explanations Default
parameter_ranges The lower and upper limit for each parameter
posterior_evaluation Evaluation function handle for the posterior
output_samples Number of posterior samples that are required
* initial_samples Choice of 'emcee' or a provided start sample ['automatic', 50, 1000]
* gaussbock_iterations Maximum number of Gaussbock iterations 10
*convergence_threshold Threshold for inter-iteration convegence checks 1e-3
* mixture_samples Number of samples drawn for importance sampling 1e5
* em_iterations Maximum number of EM iterations for the mixture model 1000
* tolerance_range The range for the shrinking convergence threshold [1e-2, 1e-7]
* model_components Maximum number of Gaussians fitted to samples ceiling((2 / 3) * D)
* model_covariance Type of covariance for the GMM fitting process 'full'
* parameter_init How to intialize model weights, means and covariances 'random'
* model_verbosity The amount of information printed during runtime 1
* mpi_parallelization Whether to parallelize Gaussbock using an MPI pool False
* processes Number of processes Gaussbock should parallelize over 1
* weights_and_model Whether to return importance weights and the model False
* truncation_alpha Truncation value for importance probability re-weighting 2.0
* model_selection Type of model used for the fitting process None
* kde_bandwidth Kernel bandwidth used when fitting via KDE 0.5



After the installation via PyPI, or using the gaussbock.py file locally, the usage looks like this:

from gaussbock import gaussbock

output = gaussbock(parameter_ranges = your_posterior_ranges,
                   posterior_evaluation = your_posterior_function,
                   output_samples = your_required_samples,
                   mpi_parallelization = True,
                   weights_and_model = True)

samples, weights, model = output

Note that, in the above example, we use two of the optional parameters to tell the tool to parallelize using MPI, for example for the use on a supercomputer, and to return the importance weights and the model, for example for weighting the returned samples and saving the model to draw further samples later. If we wouldn't set the weights and model return indicator to be true, there would be no need to split the output up, as the output would be just the list of samples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaussbock-1.0.7.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gaussbock-1.0.7-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file gaussbock-1.0.7.tar.gz.

File metadata

  • Download URL: gaussbock-1.0.7.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for gaussbock-1.0.7.tar.gz
Algorithm Hash digest
SHA256 590ef8b148bdc119ad02e2fa5707962ab25b40c349ccd52981546a695ef0eb38
MD5 e3becb78ed2aa50f22be75f52f5efdc6
BLAKE2b-256 76c6ad6afd8c92ad15f66cd1593209cc51236de4d17c5ceeccd6f609acc64d83

See more details on using hashes here.

File details

Details for the file gaussbock-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: gaussbock-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for gaussbock-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e6396116ca7d1b3768640c9f4d8a303c6949bf5b47cdf4780642bb5714874911
MD5 3fd36b41e7e654571134838b2ef3ba5f
BLAKE2b-256 b240f1d08c9e287823238edf844f1ee050d818c22e10ee669fb8e2f63a8277e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page