Automatic Model Selection Using Cluster Indices
Project description
CIAMS - Clustering Indices based Automatic classification Model Selection
The code for CIAMS is packaged under the name AutoMS.
AutoMS (Automatic Model Selection Using Cluster Indices) is a machine learning model recommendation and dataset classifiability assessment toolkit.
Find the documentation here.
Table of Contents
- Overview
- Installing AutoMS
- Configuring AutoMS
- Running AutoMS on a dataset
- Documentation
- Authors
- Acknowledgments
Overview
AutoMS estimates the maximum achievable f1 scores corresponding to various classifier models for a given binary classification dataset. These estimated scores help make informed choices about the classifier models to experiment on the dataset, and also speculate what to expect from each of them. AutoMS also predicts the classification complexity of the dataset which characterizes the ease with which the dataset can be classified.
AutoMS extracts clustering-based metafeatures from the dataset and uses fitted classification and regression models to predict the classification complexity and estimate the maximum achievable f1-scores corresponding to various classifier models for the dataset.
Note: f1-score in all discussions pertaining to AutoMS refers to a variant of weighted average f1-score for binary datasets from class imbalance learning literature that weights the f1-scores of classes inversely proportional to their proportions in the dataset.
where,
R
is the class imbalance ratio, which is the fraction of number of samples in the majority class to the number of samples in the minority class.
Installing AutoMS
We recommend installing automs into a virtual environment.
$ sudo pip install virtualenv
$ virtualenv --python=python3.6 automs-venv
$ source automs-venv/bin/activate
$ pip install automs
Tip: If you encounter errors in installing AutoMS, install
python3.6-dev
system package (which contains the header files and static library for Python) and, then attempt installingautoms
again.$ sudo apt-get install python3.6-dev $ pip install automs
Configuring AutoMS
The default configurations with which to run automs
can be configured using the AutoMS Configuration Wizard with:
$ automs-config
The configured defaults can be overriden for each invocation of automs
by suppling appropriate arguments to the command-line or python interface.
Running AutoMS on a dataset
Step 1: Downloading the dataset
Download a binary classification dataset of choice (in csv, libsvm or arff format) from the web. In this illustration, we will be using the Connectionist Bench (Sonar, Mines vs. Rocks) Data Set. Download the dataset in csv format from here with:
$ wget https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data
Change the current working directory to the directory into which the dataset was downloaded. Rename the dataset file to have a '.csv' extension.
$ mv sonar.all-data sonar.csv
Note: AutoMS infers the data format of a dataset file from its filename extension. Therefore, you must rename the dataset file to have a filename extension that corresponds to its data format. Supported filename extensions (and data formats) are '.csv', '.libsvm' and '.arff'.
Step 2: Creating the dataset configuration file
The configuration file for the dataset encodes information about the structure of the dataset file.
Create a dataset configuration file for the dataset in the same directory as the dataset file, with filename same as the dataset filename suffixed with a '.config.py' extension (i.e., in this case sonar.csv.config.py
).
$ echo -e "from automs.config import CsvConfig\nconfig = CsvConfig()" > sonar.csv.config.py
$ cat sonar.csv.config.py
For examples of the configuration file content corresponding to variety of dataset files, refer to the examples section in documentation.
Note: For the dataset file
sonar.csv
, the contents of the dataset configuration filesonar.csv.config.py
is:from automs.config import CsvConfig config = CsvConfig()Since, the dataset file in this case is aligned with the default values of the arguments to
CsvConfig
class, no arguments have been explicitly passed toCsvConfig
class in the creation of theconfig
object. However, you may need to override some of the default values of the arguments to your data format specific dataset configuration class in the creation of theconfig
object, to suit to your dataset file.
For information about the dataset configuration classes corresponding to the various data formats and the arguments they accept, refer to API documentation of Dataset Configuration Classes.
Step 3: Predicting Classification Complexity and Estimating F1 scores for the dataset
Command-line Interface
$ automs sonar.csv --oneshot --truef1 --result sonar_results
For the more information about the oneshot and subsampling approaches, refers to What are the oneshot and sub-sampling appeoaches ? and When should I use the oneshot and sub-sampling approaches ? in the FAQ section in documentation.
The predicted classification complexity, estimated f1-score and true f1-score results for the dataset should be available in the sonar_results
file after the completion of execution of the program.
$ cat sonar_results
Note: The predicted classification complexity boolean value indicates if the dataset can be classified with a f1-score > 0.6 using any of the classification methods.
True
indicates that the dataset is hard to to classify andFalse
indicates that the dataset is easy to classify.The estimated f1-scores corresponding to various classifier models should help identify the candidate top performing classification methods for the dataset, and help reduce the search space of classification algorithms to be experimented on the dataset.
For more information about the AutoMS command line interface and the arguments it accepts, refer to API Documentation for AutoMS command line interface.
$ automs --help
Python Interface
>>> from automs.automs import automs
>>> is_hard_to_classify, estimated_f1_scores, true_f1_scores = automs('sonar.csv', oneshot=True, return_true_f1s=True)
>>> print(f"IS HARD TO CLASSIFY = {is_hard_to_classify}")
>>> print(f"Estimated F1-scores = {estimated_f1_scores}")
>>> print(f"True F1-scores = {true_f1_scores}")
For more information about the AutoMS python interface and the arguments it accepts, refer to API Documentation for AutoMS python interface.
>>> from automs.automs import automs
>>> help(automs)
Tip: Inspect the configured (or specified) warehouse sub-directory corresponding to the last run of AutoMS for result files
results.xlsx
,predicted_classification_complexity
,estimated_f1_scores
andtrue_f1_scores
, and the intermediate data subsample files in itsbags/
sub-directory.$ ls <Path to configured AutoMS warehouse> $ cd <Path to configured AutoMS warehouse>/sonar.csv/ $ tail -n +1 predicted_classification_complexity estimated_f1_scores true_f1_scores $ xdg-open results.xlsx
Documentation
The AutoMS documentation is hosted at https://automs.readthedocs.io/.
Authors
- Sudarsun Santhiappan, IIT Madras & BUDDI.AI
- Nitin Shravan, BUDDI.AI
Acknowledgments
- Mukesh Reghu, BUDDI.AI
- Jeshuren Chelladurai, IIT Madras & BUDDI.AI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file automs-0.1.0.tar.gz
.
File metadata
- Download URL: automs-0.1.0.tar.gz
- Upload date:
- Size: 114.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38bcc9a74867ee7e41bfe00d489753a32c36d718f61d446c70118f497355ca6c |
|
MD5 | f8b4f764f146323727d8855ebc04e543 |
|
BLAKE2b-256 | caf710b8f25380c4d5679dbd2b21321c2316ee17cb30cbd5e41aee785697aac7 |
File details
Details for the file automs-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: automs-0.1.0-py3-none-any.whl
- Upload date:
- Size: 117.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b028bb8e39a82055fb298614244d10b444031e5891b57b8d7dc3001f50f605e |
|
MD5 | f300dba693f39e6e1129c0a79cc0f226 |
|
BLAKE2b-256 | 1abafa152187aded383d944925d0dd3278641d6d2171f25b0f4621e1b958f5b9 |