Skip to main content

Mixture modeling algorithms using the Student's t-distribution

Project description

studenttmixture

NOTE: As of version 0.0.2.2, this package uses scikit-learn's KMeans clustering to initialize component locations. This adds an additional dependency (in addition to numpy and scipy) but gives faster convergence, since KMeans provides a good way to choose initial cluster centers that are refined by the mixture model.

Mixtures of multivariate Student's t distributions are widely used for clustering data that may contain outliers, but scipy and scikit-learn do not at present offer classes for fitting Student's t mixture models. This package provides classes for:

  1. Modeling / clustering a dataset using a finite mixture of multivariate Student's t distributions fit via the EM algorithm. You can select the number of components using either prior knowledge or the information criteria calculated by the model (AIC, BIC).
  2. Modeling / clustering a dataset using a mixture of multivariate Student's t distributions fit via the variational mean-field approximation. Depending on the hyperparameters you select, the fitting process may kill off unneeded clusters, so the number of components in this case acts as an upper bound.
  3. Modeling / clustering an infinite mixture of Student's t-distributions (i.e. a Dirichlet process). In practice, this model is fitted using some small modifications to the mean-field recipe and has some of the same advantages and limitations.

(1) and (2) are currently available; (3) will be available in version 0.0.3.

Unittests for the package are in the tests folder.

Installation

pip install studenttmixture

Note that starting in version 0.0.2.3, this package contains C extensions and is therefore distributed as a source distribution which is automatically compiled on install. This is a little less convenient but provides a large speed increase.

It is unusual but problems with source distribution pip packages that contain C extensions are occasionally observed on Windows, e.g. an error similar to this:

error: Microsoft Visual C++ 14.0 is required.

in the unlikely event you encounter this, I recommend the solution described under this StackOverflow and links.

Finally, if you for whatever reason prefer the pure Python version, install version 0.0.2.2, i.e.:

pip install studenttmixture==0.0.2.2

training for mixture models will run slower but no compilation is required.

Usage

Background

Upcoming in future versions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

studenttmixture-0.0.2.3.tar.gz (29.0 kB view details)

Uploaded Source

File details

Details for the file studenttmixture-0.0.2.3.tar.gz.

File metadata

  • Download URL: studenttmixture-0.0.2.3.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.5

File hashes

Hashes for studenttmixture-0.0.2.3.tar.gz
Algorithm Hash digest
SHA256 28fe1bfb51a4b367ac6a48d12dd4f0b1bd01454d53d8ca37f0bcedee5ff3aba2
MD5 f9d24caf5695eb926b78c496db0f61b6
BLAKE2b-256 8190dee478e1dc9027f195a95ba86f0671c269040d224de87663bfcc4bec40d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page