Skip to main content

MultiMin: Multivariate Gaussian fitting

Project description

MultiMin Logo

version license pythonver downloads Powered by SciPy

Introducing MultiMin

MultiMin is a Python package designed to provide numerical tools for fitting composed multivariate distributions to data. It is particularly useful for modelling complex multimodal distributions in N-dimensions.

These are the main features of MultiMin:

  • Multivariate Fitting: Tools for fitting composed multivariate normal distributions (CMND).

  • Visualization: Density plots and specific visualization utilities.

  • Statistical Analysis: Tools for handling covariance matrices and correlations.

Documentation

Full API documentation is available at https://multimin.readthedocs.io.

Installation

From PyPI

MultiMin will be available on PyPI at https://pypi.org/project/multimin/. Once published, you can install it with:

pip install -U multimin

From Sources

You can also install from the GitHub repository:

git clone https://github.com/seap-udea/multimin
cd multimin
pip install .

For development, use an editable installation:

cd multimin
pip install -e .

In Google Colab

If you use Google Colab, you can install MultiMin by executing:

!pip install -U multimin

Theoretical Background

The core of MultiMin is the Composed Multivariate Normal Distribution (CMND). The theory behind it posits that any multivariate distribution function \(p(\tilde U):\Re^{N}\rightarrow\Re\), where \(\tilde U:(u_1,u_2,u_3,\ldots,u_N)\) are random variables, can be approximated with arbitrary precision by a normalized linear combination of \(M\) Multivariate Normal Distributions (MND):

\begin{equation*} p(\tilde U) \approx \mathcal{C}_M(\tilde U; \{w_k\}_M, \{\mu_k\}_M, \{\Sigma_k\}_M) \equiv \sum_{i=1}^{M} w_i\;\mathcal{N}(\tilde U; \tilde \mu_i, \Sigma_i) \end{equation*}

where the multivariate normal \(\mathcal{N}(\tilde U; \tilde \mu, \Sigma)\) with mean vector \(\tilde \mu\) and covariance matrix \(\Sigma\) is given by:

\begin{equation*} \mathcal{N}(\tilde U; \tilde \mu, \Sigma) = \frac{1}{\sqrt{(2\pi)^{k} \det \Sigma}} \exp\left[-\frac{1}{2}(\tilde U - \tilde \mu)^{\rm T} \Sigma^{-1} (\tilde U - \tilde \mu)\right] \end{equation*}

The covariance matrix \(\Sigma\) elements are defined as \(\Sigma_{ij} = \rho_{ij}\sigma_{i}\sigma_{j}\), where \(\sigma_i\) is the standard deviation of \(u_i\) and \(\rho_{ij}\) is the correlation coefficient between variable \(u_i\) and \(u_j\) (\(-1<\rho_{ij}<1\), \(\rho_{ii}=1\)).

The normalization condition on \(p(\tilde U)\) implies that the set of weights \(\{w_k\}_M\) are also normalized, i.e., \(\sum_i w_i=1\).

Fitting procedure

To estimate the parameters of the CMND that best describe a given dataset , we use the Likelihood Statistics method.

Given a dataset of \(S\) objects with state vectors \(\{\tilde U_k\}_{k=1}^S\), the likelihood \(\mathcal{L}\) of the CMND parameters is defined as the product of the probability densities evaluated at each data point:

\begin{equation*} \mathcal{L} = \prod_{i=1}^{S} \mathcal{C}_M(\tilde U_i) \end{equation*}

The goal is to find the set of parameters (weights, means, and covariances) that maximize this likelihood. In practice, it is numerically more stable to minimize the negative normalized log-likelihood:

\begin{equation*} -\frac{\log \mathcal{L}}{S} = -\frac{1}{S} \sum_{i=1}^{S} \log \mathcal{C}_M(\tilde U_i) \end{equation*}

This approach allows us to fit the distribution without making strong assumptions about the underlying normality of the data, effectively treating the CMND as a series expansion of the true probability density function.

In MultiMin, we use the scipy.optimize.minimize function to find the set of parameters that minimize the negative normalized log-likelihood.

Quickstart

Getting started with MultiMin is straightforward. Import the package:

import multimin as mn

NOTE: If you are working in Google Colab, load the matplotlib backend before producing plots:

%matplotlib inline

Here is a basic example of how to use MultiMin to fit a 3D distribution composed of 2 Multivariate Normals.

1. Define a true distribution

First, we define a distribution from which we will generate synthetic data. We use a Composed Multivariate Normal Distribution (CMND) with 2 Gaussian components (ngauss=2) in 3 dimensions (nvars=3).

import numpy as np
import multimin as mn

# Define parameters for 2 Gaussian components
weights = [0.5, 0.5]
mus = [[1.0, 0.5, -0.5], [1.0, -0.5, +0.5]]
sigmas = [[1, 1.2, 2.3], [0.8, 0.2, 3.3]]
deg = np.pi/180
angles = [
    [10*deg, 30*deg, 20*deg],
    [-20*deg, 0*deg, 30*deg],
]

# Calculate covariance matrices from rotation angles
Sigmas = mn.Stats.calc_covariance_from_rotation(sigmas, angles)

# Create the CMND object
CMND = mn.ComposedMultiVariateNormal(mus=mus, weights=weights, Sigmas=Sigmas)

2. Generate sample data

We generate 5000 random samples from this distribution to serve as our “observed” data.

np.random.seed(1)
data = CMND.rvs(5000)

3. Visualize the data

We can check the distribution of the generated data using DensityPlot.

import matplotlib.pyplot as plt

# Define properties labels
properties = dict(
    x=dict(label=r"$x$", range=None),
    y=dict(label=r"$y$", range=None),
    z=dict(label=r"$z$", range=None),
)

# Plot the density plot
G = mn.DensityPlot(properties, figsize=3)
hargs = dict(bins=30, cmap='Spectral_r')
sargs = dict(s=1.2, edgecolor='None', color='r')
hist = G.scatter_plot(data, **sargs)
Data Scatter Plot

Data Scatter Plot

4. Initialize the Fitter and Run the Fit

We initialize the FitCMND handler with the expected number of Gaussians (2) and variables (3). We then run the fitting procedure.

# Initialize the fitter
F = mn.FitCMND(ngauss=2, nvars=3)

# Run the fit (using advance=True for better convergence on complex models)
F.fit_data(data, advance=True)

5. Check and Plot Results

Finally, we visualize the fitted distribution compared to the data.

# Plot the fit result
G = F.plot_fit(
    props=["x", "y", "z"],
    hargs=dict(bins=30, cmap='YlGn'),
    sargs=dict(s=0.2, edgecolor='None', color='r'),
    figsize=3
)
Fit Result

Fit Result

6. Inspect Parameters and Get Explicit PDF Function

You can tabulate the fitted parameters and obtain an explicit Python function that evaluates the fitted PDF. Below, each step is shown with its output.

Stage 1: Tabulate the fitted CMND

F.cmnd.tabulate(sort_by='weight')

Output (example):

             w  mu_1  mu_2  mu_3  sigma_1  sigma_2  sigma_3  rho_12   rho_13   rho_23
component
1          0.5  1.02  0.52 -0.59     1.07     1.51     2.08   -0.28    0.22   -0.57
2          0.5  1.01 -0.50  0.53     0.79     0.24     3.23    0.56    0.02  -0.02

Stage 2: Get the source code and a callable function

code, cmnd = F.cmnd.get_function()

Output (the printed code, which you can copy):

from multimin import nmd

def cmnd(X):

    mu1 = [1.02, 0.52, -0.59]
    Sigma1 = [[1.14, -0.38, 0.36], [-0.38, 2.28, -1.70], [0.36, -1.70, 4.35]]
    n1 = nmd(X, mu1, Sigma1)

    mu2 = [1.01, -0.50, 0.53]
    Sigma2 = [[0.63, 0.11, 0.04], [0.11, 0.06, -0.01], [0.04, -0.01, 10.41]]
    n2 = nmd(X, mu2, Sigma2)

    w1 = 0.52
    w2 = 0.48

    return (
        w1*n1
        + w2*n2
    )

Stage 3: Evaluate the PDF at a point

cmnd([1.0, 0.5, -0.5])

Output (example):

0.12345678901234567

Stage 4: LaTeX output for papers

You can get the fitted PDF as a LaTeX string (suitable for inclusion in papers) with parameter values and the definition of the normal distribution:

latex_str, _ = F.cmnd.get_function(print_code=False, type='latex', decimals=4)
print(latex_str)

Example output:

\begin{equation*} f(\mathbf{x}) = w_1 \, \mathcal{N}(\mathbf{x}; \boldsymbol{\mu}_1, \mathbf{\Sigma}_1) + w_2 \, \mathcal{N}(\mathbf{x}; \boldsymbol{\mu}_2, \mathbf{\Sigma}_2) \end{equation*}

where

\(w_1 = 0.502413\) \(\boldsymbol{\mu}_1 = \left( \begin{array}{c} 1.02777 \\ 0.501464 \\ -0.598576 \end{array}\right)\) \(\mathbf{\Sigma}_1 = \left( \begin{array}{ccc} 1.083742 & -0.358056 & 0.200127 \\ -0.358056 & 2.311214 & -1.74296 \\ 0.200127 & -1.74296 & 4.396044 \end{array}\right)\)

\(w_2 = 0.497587\) \(\boldsymbol{\mu}_2 = \left( \begin{array}{c} 1.003249 \\ -0.504171 \\ 0.456568 \end{array}\right)\) \(\mathbf{\Sigma}_2 = \left( \begin{array}{ccc} 0.641322 & 0.106588 & 0.033927 \\ 0.106588 & 0.05814 & 0.006255 \\ 0.033927 & 0.006255 & 10.440096 \end{array}\right)\)

Here the normal distribution is defined as:

\begin{equation*} \mathcal{N}(\mathbf{x}; \boldsymbol{\mu}, \mathbf{\Sigma}) = \frac{1}{\sqrt{(2\pi)^{{k}} \det \mathbf{\Sigma}}} \exp\left[-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{\top} \mathbf{\Sigma}^{{-1}} (\mathbf{x}-\boldsymbol{\mu})\right] \end{equation*}

A parameter table in LaTeX is also available via F.cmnd.tabulate(sort_by='weight', type='latex').

Citation

The numerical tools and codes provided in this package have been developed and tested over several years of scientific research.

If you use MultiMin in your research, please cite:

@software{multimin2026,
  author = {Zuluaga, Jorge I.},
  title = {MultiMin: Multivariate Gaussian fitting},
  year = {2026},
  url = {https://github.com/seap-udea/multimin}
}

What’s New

For a detailed list of changes and new features, see WHATSNEW.md.

Authors and Licensing

This project is developed by the Solar, Earth and Planetary Physics Group (SEAP) at Universidad de Antioquia, Medellín, Colombia. The main developers are:

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.

Contributing

We welcome contributions! If you’re interested in contributing to MultiMin, please:

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Submit a pull request

Please read the CONTRIBUTING.md file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimin-0.6.5.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multimin-0.6.5-py3-none-any.whl (4.9 MB view details)

Uploaded Python 3

File details

Details for the file multimin-0.6.5.tar.gz.

File metadata

  • Download URL: multimin-0.6.5.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for multimin-0.6.5.tar.gz
Algorithm Hash digest
SHA256 3fd10969e87718f79799f0629751f7faa7bb6142e0471cc6bde44061d9072c54
MD5 4004630b25263b115bbbbc3f5f75dacb
BLAKE2b-256 da85150f77e2015b0044f9c31eb52d59ebf37690d0bbff8fdde294fb5735cc3a

See more details on using hashes here.

File details

Details for the file multimin-0.6.5-py3-none-any.whl.

File metadata

  • Download URL: multimin-0.6.5-py3-none-any.whl
  • Upload date:
  • Size: 4.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for multimin-0.6.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bda1a83ead869db38c92943a88bffd3741e86ee9c99ffb7cc29d714e89b95a9b
MD5 8467d84861e9c86fdf79ec22e3d3575d
BLAKE2b-256 510fc42241de6d0151e2b03ca67db3b96efa9947a7bc6464621587e8f6fb8f97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page