Skip to main content

MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems

Project description

MetaCluster


GitHub release Wheel PyPI version PyPI - Python Version PyPI - Status Downloads Tests & Publishes to PyPI GitHub Release Date Documentation Status Chat GitHub contributors GitTutorial DOI License: GPL v3

MetaCluster is the largest open-source nature-inspired optimization (Metaheuristic Algorithms) library for clustering problem in Python

  • Free software: GNU General Public License (GPL) V3 license
  • Provided 3 classes: MetaCluster, MhaKCentersClustering, and MhaKMeansTuner
  • Total nature-inspired metaheuristic optimizers (Metaheuristic Algorithms): > 200 optimizers
  • Total objective functions (as fitness): > 40 objectives
  • Total supported datasets: 48 datasets from Scikit learn, UCI, ELKI, KEEL...
  • Total performance metrics: > 40 metrics
  • Total different way of detecting the K value: >= 10 methods
  • Documentation: https://metacluster.readthedocs.io/en/latest/
  • Python versions: >= 3.7.x
  • Dependencies: numpy, scipy, scikit-learn, pandas, mealpy, permetrics, plotly, kaleido

Citation Request

Please include these citations if you plan to use this library:

@article{VanThieu2023,
  author = {Van Thieu,  Nguyen and Oliva,  Diego and Pérez-Cisneros,  Marco},
  title = {MetaCluster: An open-source Python library for metaheuristic-based clustering problems},
  journal = {SoftwareX},
  year = {2023},
  pages = {101597},
  volume = {24},
  DOI = {10.1016/j.softx.2023.101597},
}

@article{van2023mealpy,
  title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python},
  author={Van Thieu, Nguyen and Mirjalili, Seyedali},
  journal={Journal of Systems Architecture},
  year={2023},
  publisher={Elsevier},
  doi={10.1016/j.sysarc.2023.102871}
}

Installation

$ pip install metacluster

After installation, check the version:

$ python
>>> import metacluster
>>> metacluster.__version__

Examples

We implement a dedicated Github repository for examples at MetaCluster_examples

Let's go through some basic examples from here:

1. First, load dataset. You can use the available datasets from MetaCluster:

# Load available dataset from MetaCluster
from metacluster import get_dataset

# Try unknown data
get_dataset("unknown")
# Enter: 1      -> This wil list all of avaialble dataset

data = get_dataset("Arrhythmia")
  • Or you can load your own dataset
import pandas as pd
from metacluster import Data

# load X and y
# NOTE MetaCluster accepts numpy arrays only, hence use the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y, name="my-dataset")

2. Next, scale your features

You should confirm that your dataset is scaled and normalized

# MinMaxScaler 
data.X, scaler = data.scale(data.X, method="MinMaxScaler", feature_range=(0, 1))

# StandardScaler 
data.X, scaler = data.scale(data.X, method="StandardScaler")

# MaxAbsScaler 
data.X, scaler = data.scale(data.X, method="MaxAbsScaler")

# RobustScaler 
data.X, scaler = data.scale(data.X, method="RobustScaler")

# Normalizer 
data.X, scaler = data.scale(data.X, method="Normalizer", norm="l2")   # "l1" or "l2" or "max"

3. Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics

list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"]
list_paras = [
    {"name": "FBIO", "epoch": 10, "pop_size": 30},
    {"name": "GWO", "epoch": 10, "pop_size": 30},
    {"name": "SMA", "epoch": 10, "pop_size": 30}
]
list_obj = ["SI", "RSI"]
list_metric = ["BHI", "DBI", "DI", "CHI", "SSEI", "NMIS", "HS", "CS", "VMS", "HGS"]

You can check all supported metaheuristic algorithms from: https://github.com/thieu1995/mealpy. All supported clustering objectives and metrics from: https://github.com/thieu1995/permetrics.

If you don't want to read the documents, you can print out all supported information by:

from metacluster import MetaCluster 

# Get all supported methods and print them out
MetaCluster.get_support(name="all")

4. Next, create an instance of MetaCluster class and run it.

model = MetaCluster(list_optimizer=list_optimizer, list_paras=list_paras, list_obj=list_obj, n_trials=3, seed=10)

model.execute(data=data, cluster_finder="elbow", list_metric=list_metric, save_path="history", verbose=False)

model.save_boxplots()
model.save_convergences()

As you can see, you can define different datasets and using the same model to run it. Remember to set the name to your dataset, because the folder that hold your results is the name of your dataset. More examples can be found here

Support

Official links (questions, problems)

Supported links

1. https://jtemporal.com/kmeans-and-elbow-method/
2. https://medium.com/@masarudheena/4-best-ways-to-find-optimal-number-of-clusters-for-clustering-with-python-code-706199fa957c
3. https://github.com/minddrummer/gap/blob/master/gap/gap.py
4. https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101
5. https://doi.org/10.1016/j.engappai.2018.03.013
6. https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/Clustering_metrics.ipynb
7. https://elki-project.github.io/
8. https://sci2s.ugr.es/keel/index.php
9. https://archive.ics.uci.edu/datasets
10. https://python-charts.com/distribution/box-plot-plotly/
11. https://plotly.com/python/box-plots/?_ga=2.50659434.2126348639.1688086416-114197406.1688086416#box-plot-styling-mean--standard-deviation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metacluster-1.3.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metacluster-1.3.0-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file metacluster-1.3.0.tar.gz.

File metadata

  • Download URL: metacluster-1.3.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for metacluster-1.3.0.tar.gz
Algorithm Hash digest
SHA256 12a62888f11bfffd7505961676a3e791ec2f98d9d1c3da72dfcd1bd2a49aa3ac
MD5 958427c626df2ab3af4b30ee27864fca
BLAKE2b-256 ffd417e1b001f9958085ca4106b45ecc5300bc355a98498b6c95839dca9244f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for metacluster-1.3.0.tar.gz:

Publisher: publish-package.yaml on thieu1995/MetaCluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metacluster-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: metacluster-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for metacluster-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f22779f4d2e1211832986fc51e382e5d7bdeb977f579cf6a63f20723c608ab4
MD5 606da65329ceb73f8b12146bcb3365e3
BLAKE2b-256 a7d35adac55cf8ef2e7dd601af70127322ae0768f991f7a6bbc2a56171a6ee81

See more details on using hashes here.

Provenance

The following attestation bundles were made for metacluster-1.3.0-py3-none-any.whl:

Publisher: publish-package.yaml on thieu1995/MetaCluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page