Skip to main content

Automated machine learning toolkit for performing clustering tasks.

Project description

autocluster

autocluster is an automated machine learning (AutoML) toolkit for performing clustering tasks.

Report and presentation slides can be found here and here.

Prerequisites

  • Python 3.5 or above
  • Linux OS, or Windows WSL is also possible

How to get started?

  1. First, install SMAC:
  • sudo apt-get install build-essential swig
  • conda install gxx_linux-64 gcc_linux-64 swig
  • pip install smac==0.8.0
  1. pip install autocluster

How it works?

  • autocluster automatically optimizes the configuration of a clustering problem. By configuration, we mean

    • choice of dimension reduction algorithm
    • choice of clustering model
    • setting of dimension reduction algorithm's hyperparameters
    • setting of clustering model's hyperparameters
  • autocluster provides 3 different approaches to optimize the configuration (with increasing complexity):

    • random optimization
    • bayesian optimization
    • bayesian optimization + meta-learning (warmstarting)

Algorithms/Models supported

  • List of dimension reduction algorithms in sklearn supported by autocluster's optimizer.
  • List of clustering models in sklearn supported by autocluster's optimizer.

Examples

Examples are available in these notebooks.

Experimental results

  • This dataset comprises of 16 Gaussian clusters in 128-dimensional space with N = 1024 points. The optimal configuration obtained by autocluster (SMAC + Warmstarting) consists of a Truncated SVD dimension reduction model + Birch clustering model.
  • This dataset comprises of 15 Gaussian clusters in 2-dimensional space with N = 5000 points. The optimal configuration obtained by autocluster (SMAC + Warmstarting) consists of a TSNE dimension reduction model + Agglomerative clustering model.

Links

  • Link to pypi.
  • Great writeup by Martin Krasser on Bayesian Optimization

Disclaimer

The project is experimental and still under development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autocluster-0.5.3.tar.gz (23.4 kB view hashes)

Uploaded Source

Built Distribution

autocluster-0.5.3-py3-none-any.whl (27.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page