Skip to main content

This package streamlines machine learning model training, evaluation, and customization with support for regression and classification tasks, automated workflow, diverse model integration, EDA, serialization, and parameter tuning.

Project description

fluid_domain.art import KMEngine

License: MIT

class KMEngine(data, column, test_ratio=0.2)


KMEngine (Karthik's Model Engine) is a versatile tool designed to streamline machine learning model training, evaluation, and customization. With support for both regression and classification tasks, Kmodel_engine offers an automated workflow that includes data preprocessing, feature selection, exploratory data analysis (EDA), model training, and metric evaluation. This class integrates an array of popular machine learning models from libraries like Scikit-Learn, XGBoost, and LightGBM. Additionally, it facilitates model parameter customization, empowering users to efficiently build, assess, and refine machine learning models tailored to their specific datasets and tasks. As of now this class cannot remove outliers, to remove them use O_sieve or IsolationForest


Features

  • Supports both regression and classification tasks
  • Integrates a wide range of machine learning models from popular libraries such as Scikit-Learn, XGBoost, LightGBM, and more
  • Performs automated data preprocessing including missing value imputation, scaling, and one-hot encoding
  • Allows for exploratory data analysis (EDA) through YData Profiling
  • Provides various evaluation metrics including accuracy, F1-score, precision, recall, R-squared, adjusted R-squared, mean absolute error (MAE), and root mean squared error (RMSE)
  • Supports custom parameter tuning for models

Parameters:

  • data: dataframe

    • The data on which the evalution of models should occur.
  • column: str

    • Target column, the dependent variable.
  • test_ratio: float, default=0.2

    • The conventional training and testing split ratio.

Installation

pip install KMEngine

Usage

EDA (EXtrapolatory Data Analysis)

import pandas as pd
from fluid_domain.art import KMEngine
# Reading a dataset using pandas.
df=pd.read_csv('tested.csv')
engine=KMEngine(df,'Survived')
eda=engine.EDA()
# EDA done. Check your working directory for the html file. 
# Produces an html file with the name 'KME_data_report.html' in the working directory.

Classification:

import pandas as pd
from fluid_domain.art import KMEngine
# Reading a dataset using pandas.
df=pd.read_csv('tested.csv')
engine=KMEngine(df,'Survived')
result=engine.super_learning()
print(result)

# Engine Summoned.
# Loaded Data Successfully with 418 rows and 12 columns.
# Building models and Training them. This might take a while...
# Engine Encountered Discrete Data. Hence, proceeding with Classification

# Writing models to respective keys: ['Logistic Regression', 'Random Forest Classifier', 'Decision Tree Classifier', 'Xtreme Gradient Boosting Classifier', 'Stochastic Gradient Descent Classifier', 'Gradient Boosting Classifier', 'Adaptive Boost Classifier', 'Light Gradient Boosting Classifier', 'Extra Trees Classifier', 'Support Vector Classification', 'K Nearest Neighbors Classifier', 'Ridge Classifier', 'MLP Classifier', 'Quadratic Discriminant Analysis', 'Linear Discriminant Analysis', 'Naive Bayes Classifier']

# Currently Running : Logistic Regression
# Currently Running : Random Forest Classifier
# Currently Running : Decision Tree Classifier
# Currently Running : Xtreme Gradient Boosting Classifier
# Currently Running : Stochastic Gradient Descent Classifier
# Currently Running : Gradient Boosting Classifier
# Currently Running : Adaptive Boost Classifier
# Currently Running : Light Gradient Boosting Classifier
# Currently Running : Extra Trees Classifier
# Currently Running : Support Vector Classification
# Currently Running : K Nearest Neighbors Classifier
# Currently Running : Ridge Classifier
# Currently Running : MLP Classifier
# Currently Running : Quadratic Discriminant Analysis
# Currently Running : Linear Discriminant Analysis
# Currently Running : Naive Bayes Classifier

# All models evaluations:
# +----------------------------------------+----------+----------+-----------+--------+---------+
# |                 Model                  | Accuracy | F1-Score | Precision | Recall | ROC AUC |
# +----------------------------------------+----------+----------+-----------+--------+---------+
# |          Logistic Regression           |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |        Random Forest Classifier        |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |        Decision Tree Classifier        |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |  Xtreme Gradient Boosting Classifier   |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# | Stochastic Gradient Descent Classifier |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |      Gradient Boosting Classifier      |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |       Adaptive Boost Classifier        |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |   Light Gradient Boosting Classifier   |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |         Extra Trees Classifier         |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |     Support Vector Classification      |   0.98   |   0.97   |    0.97   |  0.97  |   0.97  |
# |     K Nearest Neighbors Classifier     |   0.99   |   0.98   |    0.97   |  1.0   |   0.99  |
# |            Ridge Classifier            |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |             MLP Classifier             |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |    Quadratic Discriminant Analysis     |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# |      Linear Discriminant Analysis      |   0.63   |   0.37   |    0.45   |  0.31  |   0.56  |
# |         Naive Bayes Classifier         |   1.0    |   1.0    |    1.0    |  1.0   |   1.0   |
# +----------------------------------------+----------+----------+-----------+--------+---------+
# Time Eaten :3.7687642574310303 secs

Regression

import pandas as pd
from fluid_domain.art import KMEngine
# Reading a dataset using pandas.
df=pd.read_csv('co2.csv')
engine=KMEngine(df,'CO2 Emissions(g/km)')
result=engine.super_learning()
print(result)

# Engine Summoned.
# Loaded Data Successfully with 7385 rows and 12 columns.
# Building models and Training them. This might take a while...
# Engine Encountered Continuous Data. Hence, proceeding with Regression

# Writing models to respective keys: ['Linear Regression', 'Random Forest Regression', 'Light Gradient Boosting Regressor', 'Xtreme Gradient Boosting', 'Decison Tree Regressor', 'Gradient Boosting Regressor', 'Adaptive Boosting Regressor', 'Stochastic Gradient Descent Regressor', 'Support Vector Regression', 'Extra Trees Regressor', 'Ridge Regression', 'Gamma Regressor', 'Huber Regressor', 'Poisson Regressor', 'Lasso Regressor', 'Elastic Net Regressor', 'K Nearest Neighbors Regressor', 'MLP Regressor']

# Currently Running : Linear Regression
# Currently Running : Random Forest Regression
# Currently Running : Light Gradient Boosting Regressor
# Currently Running : Xtreme Gradient Boosting
# Currently Running : Decison Tree Regressor
# Currently Running : Gradient Boosting Regressor
# Currently Running : Adaptive Boosting Regressor
# Currently Running : Stochastic Gradient Descent Regressor
# Currently Running : Support Vector Regression
# Currently Running : Extra Trees Regressor
# Currently Running : Ridge Regression
# Currently Running : Gamma Regressor
# Currently Running : Huber Regressor
# Currently Running : Poisson Regressor
# Currently Running : Lasso Regressor
# Currently Running : Elastic Net Regressor
# Currently Running : K Nearest Neighbors Regressor
# Currently Running : MLP Regressor

# All models evaluations:
# +---------------------------------------+----------------+-------------------+------------+------------+
# |                 Model                 |    R2 Score    | Adjusted R2 Score |    MAE     |    RMSE    |
# +---------------------------------------+----------------+-------------------+------------+------------+
# |           Linear Regression           |      0.89      |        0.89       |   12.25    |   19.71    |
# |        Random Forest Regression       |      0.99      |        0.99       |    2.57    |    6.3     |
# |   Light Gradient Boosting Regressor   |      0.97      |        0.97       |    4.46    |   10.14    |
# |        Xtreme Gradient Boosting       |      0.99      |        0.99       |    2.91    |    6.95    |
# |         Decison Tree Regressor        |      0.98      |        0.98       |    2.48    |    8.39    |
# |      Gradient Boosting Regressor      |      0.97      |        0.97       |    5.53    |   10.53    |
# |      Adaptive Boosting Regressor      |      0.88      |        0.88       |   15.38    |   20.93    |
# | Stochastic Gradient Descent Regressor | -1490880552.75 |   -1493104842.2   | 1606127.27 | 2247205.34 |
# |       Support Vector Regression       |      0.89      |        0.89       |    9.52    |   19.32    |
# |         Extra Trees Regressor         |      0.99      |        0.99       |    2.16    |    6.18    |
# |            Ridge Regression           |      0.89      |        0.89       |   11.74    |   18.88    |
# |            Gamma Regressor            |      0.89      |        0.89       |   12.05    |   20.65    |
# |            Huber Regressor            |      0.82      |        0.82       |    8.17    |   25.38    |
# |           Poisson Regressor           |      0.92      |        0.92       |   10.35    |    17.1    |
# |            Lasso Regressor            |      0.9       |        0.9        |   12.04    |   18.94    |
# |         Elastic Net Regressor         |      0.89      |        0.89       |   11.73    |   19.41    |
# |     K Nearest Neighbors Regressor     |      0.98      |        0.98       |    3.49    |    8.15    |
# |             MLP Regressor             |      0.91      |        0.91       |    9.01    |   17.89    |
# +---------------------------------------+----------------+-------------------+------------+------------+
# Time Eaten :23.698055028915405 secs

Setting custom parameters

import pandas as pd
from fluid_domain.art import KMEngine
# Reading a dataset using pandas.
df=pd.read_csv('tested.csv')
engine=KMEngine(df,'Survived')
custom_model = engine.set_custom_params('Random Forest Classifier', n_estimators=500, max_depth=10, random_state=666)
print(custom_model)

# Engine Summoned.
# Loaded Data Successfully with 418 rows and 12 columns.
# Custom parameters applied for Random Forest Classifier:
# {'bootstrap': True, 'ccp_alpha': 0.0, 'class_weight': None, 'criterion': 'gini', 'max_depth': 10, 'max_features': 'sqrt', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 500, 'n_jobs': None, 'oob_score': False, 'random_state': 666, 'verbose': 0, 'warm_start': False}
# Currently Running : Random Forest Classifier
# Classification Metrics for RandomForestClassifier(max_depth=10, n_estimators=500, random_state=666)
# Accuracy: 1.0
# F1_Score: 1.0
# Precision: 1.0
# Recall: 1.0
# ROC AUC: 1.0

Model saving (General Model)

import pandas as pd
from fluid_domain.art import KMEngine
# Reading a dataset using pandas.
df=pd.read_csv('tested.csv')
engine=KMEngine(df,'Survived')
engine.model_save('Random Forest Classifier')

# This will save the specified model as pickle file in your wroking directory.

Model saving (Custom Model)

The specified custom model will overwrite the existing model in the model dictionary, for space conservation purposes.

import pandas as pd
from fluid_domain.art import KMEngine
# Reading a dataset using pandas.
df=pd.read_csv('tested.csv')
engine=KMEngine(df,'Survived')
custom_model = engine.set_custom_params('Random Forest Classifier', n_estimators=500, max_depth=10, random_state=666)
print(custom_model)
engine.model_save('Random Forest Classifier')

# This will save the updated specified model as pickle file in your wroking directory.

Resuing the saved model.

import pandas as pd
import pickle
saved_model=pickle.load(open('Random Forest Regression.pkl','rb'))
new_data = {
    'Make': ['ACURA','ACURA'],
    'Model': ['MDX 4WD','ILX'],
    'Vehicle Class': ['SUV - SMALL','COMPACT'],
    'Engine Size(L)': [3.5,5],
    'Cylinders': [6,12],
    'Transmission': ['AS6','AM7'],
    'Fuel Type': ['Z','D'],
    'Fuel Consumption City (L/100 km)': [11.2,13.5],
    'Fuel Consumption Hwy (L/100 km)': [10.0, 15.9],
    'Fuel Consumption Comb (L/100 km)': [25.36, 35.8],
    'Fuel Consumption Comb (mpg)': [40, 60]
}

# Convert the dictionary to a pandas DataFrame
new_data_df = pd.DataFrame(new_data)

# Use the saved model to make predictions on the new data
ypred = saved_model.predict(new_data_df)

print(ypred)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

KMEngine-1.1.2.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

KMEngine-1.1.2-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file KMEngine-1.1.2.tar.gz.

File metadata

  • Download URL: KMEngine-1.1.2.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.66.1 CPython/3.10.0

File hashes

Hashes for KMEngine-1.1.2.tar.gz
Algorithm Hash digest
SHA256 032a4da7609897e3f3c840208aa08f7aa8963be330017628a56c028ab054027b
MD5 bbe52b56506dde73e6a4d631720f2586
BLAKE2b-256 e3343169d02611cc916000e957038a5bef9e97e25cfcd81798ecb8077fd5827f

See more details on using hashes here.

File details

Details for the file KMEngine-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: KMEngine-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.66.1 CPython/3.10.0

File hashes

Hashes for KMEngine-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 89c373d8a60ee21615b48cd250b4b1ae10c240d806aa2f6c9599aeefd196baa9
MD5 f98a3889e65d2a98b550d821f4930a81
BLAKE2b-256 0d468b3d3b3bba3c3bcd2cf7c5af695b2362563d6ce58eecccc1f401a692646a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page