Skip to main content

General Model Selection Module

Project description

Chart 1: Basic GMS Workflow

Brief Description

General Model Selection Module (next: GMS-Module) is a simple yet neat model selection tool that would help machine learning developers to get their hands on the most efficient model/pipeline for their specific task. This project has brought me 5 points additionally for IT General State Exam (ЕГЭ по информатике) 😌

User only needs to pass:

  • Models AND/OR Pipelines of their choice
  • Metrics For evaluation
  • Pivot if certain metric is more important than the others
  • Data to train and evaluate on

Module would automatically make evaluations, store them and give verbose description of each model's performance!

Installation

To install GMSModule ensure that python3 and pip are installed. In terminal simply type: pip install gms OR pip3 install gms

pip3 install gms

How to use?

  1. Make sure that all the variables are prepared to be used by GMS:

    • mode: A string of your ML task: 'regression' OR 'classification'
    • include: A list of model-obj. of your choice: [LinearRegression(), SVR()]
    • metrics: A list of strings to evaluate on:
      • classification = ['accuracy', 'f1-score', 'precision', 'recall', 'roc-auc']
      • regression = ['mae', 'mape', 'mse', 'rmse', 'r2-score']
    • data: A list of your data to train/validate on: [X_train, X_test, y_train, y_test]
    • pivot: if necessary: A string of one of metrics provided: 'accuracy' (pivot is a metric that is most important for evaluation)
  2. Import GMSModule into your project:

from gms.GMSModule import GMSModule
  1. Create a GMSModule object with your data:
GMSPipe = GMSModule(mode="classification",
	pivot='f1-score',
	metrics=['accuracy', 'f1-score'],
	include=[LogisticRegression(), RandomForestClassifier()],
	data=[X_train, X_test, y_train, y_test])
  1. Use any of methods provided:
best_model, _ = GMSPipe.best_model()
print(best_model)
RandomForestClassifier()

Why this module?

Every Machine Learning developer, especially after extensive data analysis, has to pick the most precise Machine Learning model. Some engineers already know which model would fit perfectly, due to the ease of task given or due to the fact that ML model is evident.

But some engineers might struggle with the BLIND choice between dozens if not HUNDREDS of ML models / pipelines that they have built. That's where GMS Module could help!

User doesn't have to build a custom function that would evaluate each model one by one on their metrics. User just has to pass in each model and name metrics of their choice and voila!

Then, user is able to look at the GMSModule.description() and get verbose information about models' evaluations and see which models are better than the others.

Users can also get their data into variables for further usage, like in this example:

# Get predictions of the best model from list
_, preds = GMSModule.best_model()

# DataFrame data
data = {
	'id': range(25000),
	'value': preds
}

# Create a DataFrame and pass information into it
df = pd.DataFrame(data)
df.to_csv('submit.csv', index=False)

Project History

This project was created as a fun side project for me to experiment with scikit-learn tools. Project has helped me to become more focused on programming overall and taught me how to write my own PYPI module for others to use!

The idea was born on 16.10.2023 and the first draft of the project was so inefficient, so that I had to rewrite almost everything

Module used to re-evaluate each time I've tried to get evaluations of each model. Evaluations used 'if-statements' which looked hideous and unprofessional.

With the 5-th version done on 20.10.2023 everything has been changed. Re-evaluation problem was fixed, module could catch the most obvious exceptions caused by user and 'if-statements' were replaced with neat dictionaries.

As if 22.10.2023, I am creating the first version of this Markdown (README.md) file. Project is polished. All I need is to:

  • Create a module file (*.py)
  • Write a bunch of documentation: this doc in Russian, code run-through, basic use-cases and much more!
  • Get a license
  • Post this module on PYPI

21:54. 22.10.2023 I've already posted my project to PYPI. Everything seems to work fine.

23:28. 07.11.2023 I've created a new 0.3.0 version that fixed some bugs I've encountered. Now module has less bugs. New feature added: GMSModule.to_df() :)

11:02. 09.11.2023 New version 0.4.0. Now, you get get predictions of each model provided! Most of the comments were cleared due to the fact that they were unnecessary.

Quick Message

I WON'T UPDATE FILES FOR PYPI FOR THIS REPO because those commits are:

  1. Unnecessary
  2. Break auto merge for git

So please, don't look at files that are not related to gms-module code itself!

TO DO:

  • Create a Markdown file for Usage description and examples

Fixed:

  • Fixed issue with pivot = None error
  • Fixed issue with non-binary classification support
  • Added: GMSModule.get_predictions() function. Now you can evaluate each model provided!

My Socials

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gms-0.4.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

gms-0.4.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file gms-0.4.0.tar.gz.

File metadata

  • Download URL: gms-0.4.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for gms-0.4.0.tar.gz
Algorithm Hash digest
SHA256 f41cef6ac7a51ef55bb4741395b43b1e789c5759571a52119b32de5831387659
MD5 2cbbac7f6c27281a3639d3895c641950
BLAKE2b-256 6781cd3087c6e1fc1fb6bf540c5ee525ec6f3193803532fb14ebedefdb2b5927

See more details on using hashes here.

File details

Details for the file gms-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: gms-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for gms-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47426954d0fdde847b2e9adfa835d216f6fcffe50d0a9e4c636b9547be34870e
MD5 dd06d5583d6f228684f7be2ff0a818e9
BLAKE2b-256 9ea1ab385bfb5cb9609bc1af87d772246963ca1113b246a1b49492127fce0367

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page