Skip to main content

ARD EM algorithm with automatic determination of components/clusters number

Project description

# ARD EM ARD (Automatic Relevance Determination) EM implementation on Python. The classical EM-algorithm for reconstructing a mixture of normal distributions does not allow to determine the amount of components of the mixture. The ARD EM implementation suggests algorithm for automatically determining the number of components ARD EM, based on the method of relevant vectors. The idea of the algorithm is to use at the initial stage of a knowingly excessive amount of the components of the mixture with further determination of the relevant components by maximizing validity. Experiments on model problems show that the number of found clusters either coincides with the true one, or slightly excels him. In addition, clustering with ARD EM is closer to the true than the analogs based on sliding control and character of the minimum description length. It’s EM algorithm with automatic determination of number of components. It’s powerful and fast algorithm for gaussian mixture learning and clustering with unknown number of components.

# Implementation The implemented [GaussianMixtureARD](ard_em.py) class has the same interface as SkLearn’s [GaussianMixture](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture) one, but with 3 additional parameters: `python init_components="sqrt" # Initial number of components. sqrt(N) if "sqrt" alpha_bound=1e3 # Drop all components with weight_reg (alpha) > alpha_bound weight_bound=1e-3 # Drop all components with weight < weight_bound ` and without n_components one.

# Installation ` pip install git+https://github.com/Leensman/ard-em.git `

## Example `python from ard_em import GaussianMixtureARD gmm = GaussianMixtureARD() gmm = gmm.fit(X) print('Bayesian information criterion: ', gmm.bic(X)) best_n_components = gmm.n_components print('Best number of components: ', best_n_components) gmm.predict(X) ` For more examples go to [GaussianMixture.ipynb](https://github.com/Leensman/ard-em/blob/master/ard-em/examples/Gaussian%20mixture.ipynb)

## Links [Original paper](http://www.machinelearning.ru/wiki/images/d/dc/Vetrov-ArdEm-JVMMF-2009.pdf)

## Author Artem Ryzhikov, LAMBDA laboratory, Higher School of Economics, Yandex School of Data Analysis

E-mail: artemryzhikoff@yandex.ru

Linkedin: https://www.linkedin.com/in/artem-ryzhikov-2b6308103/

HSE profile: https://www.hse.ru/org/persons/190912317

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ard_em-0.1.4.tar.gz (5.7 kB view details)

Uploaded Source

Built Distributions

ard_em-0.1.4-py3.6.egg (8.5 kB view details)

Uploaded Source

ard_em-0.1.4-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file ard_em-0.1.4.tar.gz.

File metadata

  • Download URL: ard_em-0.1.4.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ard_em-0.1.4.tar.gz
Algorithm Hash digest
SHA256 112b9aa51ea3cde5959083d9e5e8dacbfa7e8ed027aadf4d477918d76a574147
MD5 65db985dd11393b782c359aac2d78c40
BLAKE2b-256 98937b791bbdfae2b26c2e9be3c3d9a4d5ec353c31a00407b25961d580c1a2f9

See more details on using hashes here.

File details

Details for the file ard_em-0.1.4-py3.6.egg.

File metadata

  • Download URL: ard_em-0.1.4-py3.6.egg
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ard_em-0.1.4-py3.6.egg
Algorithm Hash digest
SHA256 cbcd34ed4c9de9f586fedfe9fc05ec03f3ffdab68ca61df8bbe2a83878f95da6
MD5 d7a545b216a6a09ff7152489e926d488
BLAKE2b-256 ce08ae05fe6829bf9c556cb9f9b6d805d4c04b1fec4ac9acb702c6c00d363049

See more details on using hashes here.

File details

Details for the file ard_em-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ard_em-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 021d252b86cfc0617e6fd0e1c35c5b2c57634aa406639b37671cb82486236155
MD5 e67055ea2387b3f17fb5613fd7681801
BLAKE2b-256 2a669586a98c574d945d6631287da9797123e35288c511aac174b97ee456a992

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page