Skip to main content

Automatic Model Selection Using Cluster Indices

Project description

CIAMS - Clustering Indices based Automatic classification Model Selection

The code for CIAMS is packaged under the name AutoMS.

AutoMS (Automatic Model Selection Using Cluster Indices) is a machine learning model recommendation and dataset classifiability assessment toolkit.

Find the documentation here.

Table of Contents

Overview

AutoMS estimates the maximum achievable f1 scores corresponding to various classifier models for a given binary classification dataset. These estimated scores help make informed choices about the classifier models to experiment on the dataset, and also speculate what to expect from each of them. AutoMS also predicts the classification complexity of the dataset which characterizes the ease with which the dataset can be classified.

AutoMS extracts clustering-based metafeatures from the dataset and uses fitted classification and regression models to predict the classification complexity and estimate the maximum achievable f1-scores corresponding to various classifier models for the dataset.

Note: f1-score in all discussions pertaining to AutoMS refers to a variant of weighted average f1-score for binary datasets from class imbalance learning literature that weights the f1-scores of classes inversely proportional to their proportions in the dataset.

where, R is the class imbalance ratio, which is the fraction of number of samples in the majority class to the number of samples in the minority class.

Installing AutoMS

We recommend installing automs into a virtual environment.

$ sudo pip install virtualenv
$ virtualenv --python=python3.6 automs-venv
$ source automs-venv/bin/activate
$ pip install automs

Tip: If you encounter errors in installing AutoMS, install python3.6-dev system package (which contains the header files and static library for Python) and, then attempt installing automs again.

$ sudo apt-get install python3.6-dev
$ pip install automs

Configuring AutoMS

The default configurations with which to run automs can be configured using the AutoMS Configuration Wizard with:

$ automs-config

The configured defaults can be overriden for each invocation of automs by suppling appropriate arguments to the command-line or python interface.

Running AutoMS on a dataset

Step 1: Downloading the dataset

Download a binary classification dataset of choice (in csv, libsvm or arff format) from the web. In this illustration, we will be using the Connectionist Bench (Sonar, Mines vs. Rocks) Data Set. Download the dataset in csv format from here with:

$ wget https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data 

Change the current working directory to the directory into which the dataset was downloaded. Rename the dataset file to have a '.csv' extension.

$ mv sonar.all-data sonar.csv

Note: AutoMS infers the data format of a dataset file from its filename extension. Therefore, you must rename the dataset file to have a filename extension that corresponds to its data format. Supported filename extensions (and data formats) are '.csv', '.libsvm' and '.arff'.

Step 2: Creating the dataset configuration file

The configuration file for the dataset encodes information about the structure of the dataset file.

Create a dataset configuration file for the dataset in the same directory as the dataset file, with filename same as the dataset filename suffixed with a '.config.py' extension (i.e., in this case sonar.csv.config.py).

$ echo -e "from automs.config import CsvConfig\nconfig = CsvConfig()" > sonar.csv.config.py
$ cat sonar.csv.config.py

For examples of the configuration file content corresponding to variety of dataset files, refer to the examples section in documentation.

Note: For the dataset file sonar.csv, the contents of the dataset configuration file sonar.csv.config.py is:

from automs.config import CsvConfig
config = CsvConfig()

Since, the dataset file in this case is aligned with the default values of the arguments to CsvConfig class, no arguments have been explicitly passed to CsvConfig class in the creation of the config object. However, you may need to override some of the default values of the arguments to your data format specific dataset configuration class in the creation of the config object, to suit to your dataset file.

For information about the dataset configuration classes corresponding to the various data formats and the arguments they accept, refer to API documentation of Dataset Configuration Classes.

Step 3: Predicting Classification Complexity and Estimating F1 scores for the dataset

Command-line Interface

$ automs sonar.csv --oneshot --truef1 --result sonar_results

For the more information about the oneshot and subsampling approaches, refers to What are the oneshot and sub-sampling appeoaches ? and When should I use the oneshot and sub-sampling approaches ? in the FAQ section in documentation.

The predicted classification complexity, estimated f1-score and true f1-score results for the dataset should be available in the sonar_results file after the completion of execution of the program.

$ cat sonar_results

Note: The predicted classification complexity boolean value indicates if the dataset can be classified with a f1-score > 0.6 using any of the classification methods. True indicates that the dataset is hard to to classify and False indicates that the dataset is easy to classify.

The estimated f1-scores corresponding to various classifier models should help identify the candidate top performing classification methods for the dataset, and help reduce the search space of classification algorithms to be experimented on the dataset.

For more information about the AutoMS command line interface and the arguments it accepts, refer to API Documentation for AutoMS command line interface.

$ automs --help

Python Interface

>>> from automs.automs import automs
>>> is_hard_to_classify, estimated_f1_scores, true_f1_scores = automs('sonar.csv', oneshot=True, return_true_f1s=True)
>>> print(f"IS HARD TO CLASSIFY = {is_hard_to_classify}")
>>> print(f"Estimated F1-scores = {estimated_f1_scores}")
>>> print(f"True F1-scores = {true_f1_scores}")

For more information about the AutoMS python interface and the arguments it accepts, refer to API Documentation for AutoMS python interface.

>>> from automs.automs import automs
>>> help(automs)

Tip: Inspect the configured (or specified) warehouse sub-directory corresponding to the last run of AutoMS for result files results.xlsx, predicted_classification_complexity, estimated_f1_scores and true_f1_scores, and the intermediate data subsample files in its bags/ sub-directory.

$ ls <Path to configured AutoMS warehouse>
$ cd <Path to configured AutoMS warehouse>/sonar.csv/
$ tail -n +1 predicted_classification_complexity estimated_f1_scores true_f1_scores
$ xdg-open results.xlsx

Documentation

The AutoMS documentation is hosted at https://automs.readthedocs.io/.

Authors

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automs-0.1.0.tar.gz (114.8 kB view details)

Uploaded Source

Built Distribution

automs-0.1.0-py3-none-any.whl (117.1 kB view details)

Uploaded Python 3

File details

Details for the file automs-0.1.0.tar.gz.

File metadata

  • Download URL: automs-0.1.0.tar.gz
  • Upload date:
  • Size: 114.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.5

File hashes

Hashes for automs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 38bcc9a74867ee7e41bfe00d489753a32c36d718f61d446c70118f497355ca6c
MD5 f8b4f764f146323727d8855ebc04e543
BLAKE2b-256 caf710b8f25380c4d5679dbd2b21321c2316ee17cb30cbd5e41aee785697aac7

See more details on using hashes here.

File details

Details for the file automs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: automs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 117.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.5

File hashes

Hashes for automs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b028bb8e39a82055fb298614244d10b444031e5891b57b8d7dc3001f50f605e
MD5 f300dba693f39e6e1129c0a79cc0f226
BLAKE2b-256 1abafa152187aded383d944925d0dd3278641d6d2171f25b0f4621e1b958f5b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page