Skip to main content

A Python library for fitting mixture models using gradient based inference

Project description

Mixture-Models

codecov License: MIT

A one-stop Python library for fitting a wide range of mixture models such as Mixture of Gaussians, Students'-T, Factor-Analyzers, Parsimonious Gaussians, MCLUST, etc.

Table of Contents

Why this library

While there are several packages in R and Python which support various kinds of mixture-models, each one of them has their own API and syntax. Further, in almost all those libraries, the inference proceeds via Expectation-Maximization (a Quasi first order method) which makes them unsuitable for high-dimensional data.

This library attempts to provide a seamless and unified interface for fitting a wide-range of mixture models. Unlike many existing packages that rely on Expectation-Maximization for inference, our approach leverages Automatic Differentiation tools and gradient-based optimization which makes it well equipped to handle high-dimensional data and second order optimization routines.

Installation and Quick Start

Installation is straightforward:

pip install Mixture-Models

Quick Start

The estimation procedure consists of 3 simple steps:

### Simulate some dummy data using the built-in function make_pinwheel
data = make_pinwheel(radial_std=0.3, tangential_std=0.05, num_classes=3,
                    num_per_class=100, rate=0.4,rs=npr.RandomState(0))

### Plot the three clusters
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(data[:, 0], data[:, 1], 'k.')
plt.show()

### STEP 1 - Choose a mixture model to fit on your data
my_model = GMM(data)

### STEP 2 - Initialize your model with some parameters    
init_params = my_model.init_params(num_components = 3,scale = 0.5)

### STEP 3 - Learn the parameters using some optimization routine
params_store = my_model.fit(init_params,"Newton-CG")

Once the model is trained on the data (which is a numpy matrix of shape (num_datapoints, num_dim)), post-hoc analysis can be performed:

for params in params_store:
    print("likelihood",my_model.likelihood(params))
    print("aic,bic",my_model.aic(params),my_model.bic(params))

np.array(my_model.labels(data,params_store[-1])) ## final predicted labels

Example notebooks are available on the project Github repo.

Supported models and optimization routines

There are more than 30+ different mixture-models, spread across five model families, currently supported by the library. Here is a brief overview of the different model families supported:

The project repo 'Examples' folder includes more detailed illustrations for all these models, as well as a README.md for advanced users who want to fit custom mixture models, or tinker with the settings for the above procedure.

Currently, four main gradient based optimizers are available:

  • "grad_descent": Stochastic Gradient Descent (SGD) with momentum
  • "rms_prop": Root-mean-squared propagation (RMS-Prop)
  • "adam": Adaptive moments (ADAM)
  • "Newton-CG": Newton-Conjugate Gradient (Newton CG)

The details about each optimizer and its optional input parameters are given in the PDF in the 'Examples' folder. The output of fit method is the set of all points in the parameter space that the optimizer has traversed during the optimization i.e. list of parameters with the final entry in the list being the final fitted solution. We have a detailed notebook Optimizers_illustration.ipynb in the 'Examples' folder on Github.

Contributing

We welcome contributions to our library. Our code base is highly modularized, making it easy for new contributors to extend its capabilities and add support for additional models. If you are interested in contributing to the library, check out the contribution guide.

If you're unsure where to start, check out our open issues for inspiration on the kind of problems you can work on. Alternately, you could also open a new issue so we can discuss the best strategy for integrating your work.


If you use this package, please consider citing our research as

@article{kasa2020model, title={Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data}, author={Kasa, Siva Rajesh and Rajan, Vaibhav}, journal={arXiv preprint arXiv:2007.12786}, year={2020} }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Mixture_Models-0.0.8.tar.gz (49.2 kB view details)

Uploaded Source

Built Distribution

Mixture_Models-0.0.8-py3-none-any.whl (46.6 kB view details)

Uploaded Python 3

File details

Details for the file Mixture_Models-0.0.8.tar.gz.

File metadata

  • Download URL: Mixture_Models-0.0.8.tar.gz
  • Upload date:
  • Size: 49.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.10

File hashes

Hashes for Mixture_Models-0.0.8.tar.gz
Algorithm Hash digest
SHA256 23605474b22f93bafc2b70455c9276eaa185dca8086d7b47f46854e3cacbeca6
MD5 d1a479f391aa55c45a954e97d8fc757d
BLAKE2b-256 b00cb5716daa9a1ae211c2233c3f4b2c0e51a3c594861ea974da2b532fc8e516

See more details on using hashes here.

File details

Details for the file Mixture_Models-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for Mixture_Models-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 3000f8a552cfdb3581ce65db4f7b1f90b6d431fe3cf0bf5fbb2167d973a1d3be
MD5 b39c88537645916de4b50296ea1b3d9e
BLAKE2b-256 efffb1568d8b4749409c1dbd75adf7e0390bcfbc9c0433eb66354eb1658058d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page