Skip to main content

SmplML is a user-friendly Python module for streamlined machine learning classification and regression. It offers intuitive functionality for data preprocessing, model training, and evaluation. Ideal for beginners and experts alike, SmplML simplifies ML tasks, enabling you to gain valuable insights from your data with ease.

Project description

SmplML / SimpleML: Simplified Machine Learning for Classification and Regression

SmplML is a user-friendly Python module for streamlined machine learning classification and regression. It offers intuitive functionality for data preprocessing, model training, and evaluation. Ideal for beginners and experts alike, SmplML simplifies ML tasks, enabling you to gain valuable insights from your data with ease.

Features

  • Data preprocessing: Easily handle encoding categorical variables and data partitioning.
  • Model training: Train various classification and regression models with just a few lines of code.
  • Model evaluation: Evaluate model performance using common metrics.
  • This module is designed to seamlessly handle various scikit-learn models, making it flexible for handling sklearn-like model formats.
  • Added training feature for training multiple models for experimentation.

Installation

You can install SmpML using pip:

pip install SimpleML

Usage

The TrainModel class is designed to handle both classification and regression tasks. It determines the task type based on the target parameter. If the target has a float data type, the class automatically redirects the procedures to regression; otherwise, it assumes a classification task.

Data Preparation

Data preparation like data spliting and converting categorical data into numerical data is also automatically executed when calling the fit() method.

import seaborn as sns
import pandas as pd
from smpl_ml.smpl_ml import TrainModel

Classification Task

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
df = sns.load_dataset('penguins')
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female
clf_target = 'sex'
clf_features = df.iloc[:, df.columns != clf_target].columns

print(f"Class: {clf_target}")
print(f"Features: {clf_features}")
Class: sex
Features: Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm',
       'flipper_length_mm', 'body_mass_g'],
      dtype='object')

Single Classification Model Training

# Initialize TrainModel object
clf_trainer = TrainModel(df.dropna(), target=clf_target, features=clf_features, models=LogisticRegression(C=0.01, max_iter=10_000))

# Fit the object
clf_trainer.fit()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Recall Specificity Precision F1-Score Accuracy
Male 0.85 0.82 0.83 0.84 0.84
Female 0.82 0.85 0.84 0.83 0.84

The displayed dataframe when calling the fit() method contains the training results, this output can be suppressed by setting verbose=False.

# Evaluate the model
clf_trainer.evaluate()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Recall Specificity Precision F1-Score Accuracy
Male 0.73 0.86 0.83 0.78 0.8
Female 0.86 0.73 0.77 0.81 0.8

The displayed dataframe when calling the evaluate() method contains the testing results, this output can be suppressed by setting verbose=False.

# Access the fitted model
clf_trainer.fitted_models_dict
{'LogisticRegression': LogisticRegression(C=0.01, max_iter=10000)}

Multiple Classification Model Training

# Initialize TrainModel object
clfs = [LogisticRegression(), DecisionTreeClassifier(), RandomForestClassifier(), SVC(), KNeighborsClassifier()]

clf_trainer = TrainModel(df.dropna(), target=clf_target, features=clf_features, models=clfs, test_size=0.4)

# Fit the object
clf_trainer.fit(verbose=False)
# Evaluate the model
clf_trainer.evaluate(verbose=True)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Recall Specificity Precision F1-Score Accuracy
Male 0.76 0.81 0.82 0.79 0.78
Female 0.81 0.76 0.75 0.78 0.78
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Recall Specificity Precision F1-Score Accuracy
Male 0.86 0.83 0.85 0.85 0.84
Female 0.83 0.86 0.84 0.83 0.84
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Recall Specificity Precision F1-Score Accuracy
Male 0.84 0.86 0.87 0.85 0.85
Female 0.86 0.84 0.83 0.84 0.85
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Recall Specificity Precision F1-Score Accuracy
Male 0.49 0.73 0.67 0.57 0.6
Female 0.73 0.49 0.56 0.63 0.6
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Recall Specificity Precision F1-Score Accuracy
Male 0.74 0.78 0.79 0.76 0.76
Female 0.78 0.74 0.73 0.75 0.76

Results

clf_trainer.results_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Model Accuracy
0 RandomForestClassifier 0.85
1 DecisionTreeClassifier 0.84
2 LogisticRegression 0.78
3 KNeighborsClassifier 0.76
4 SVC 0.60
clf_trainer.fitted_models_dict
{'LogisticRegression': LogisticRegression(),
 'DecisionTreeClassifier': DecisionTreeClassifier(),
 'RandomForestClassifier': RandomForestClassifier(),
 'SVC': SVC(),
 'KNeighborsClassifier': KNeighborsClassifier()}

Accuracy results and the fitted models can be accessed through the results_df and fitted_models_dict attributes.

Regression Task

from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor
df = sns.load_dataset('penguins')
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female
reg_target = 'bill_length_mm'
reg_features = df.iloc[:, df.columns != reg_target].columns

print(f"Class: {reg_target}")
print(f"Features: {reg_features}")
Class: bill_length_mm
Features: Index(['species', 'island', 'bill_depth_mm', 'flipper_length_mm',
       'body_mass_g', 'sex'],
      dtype='object')

Single Regression Model Training

# Initialize TrainModel object
reg_trainer = TrainModel(df.dropna(), 
                         target=reg_target, 
                         features=reg_features,
                         models=LinearRegression())

# Fit the object
reg_trainer.fit(verbose=False)
# Evaluate the model
reg_trainer.evaluate()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
MSE RMSE MAE R-squared
Metrics 6.3 2.51 1.91 0.81
# Access the model
reg_trainer.fitted_models_dict
{'LinearRegression': LinearRegression()}

Multiple Regression Model Training

# Initialize TrainModel object
regs = [LinearRegression(), DecisionTreeRegressor(), RandomForestRegressor(), SVR(), GradientBoostingRegressor()]

reg_trainer = TrainModel(df.dropna(), target=reg_target, features=reg_features, models=regs, test_size=0.4)

# Fit the object
reg_trainer.fit(verbose=False)
# Evaluate the model
reg_trainer.evaluate(verbose=False)

Results

reg_trainer.results_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Model MSE RMSE MAE R-squared
0 RandomForestRegressor 5.74 2.40 1.87 0.81
1 GradientBoostingRegressor 6.58 2.57 1.94 0.79
2 DecisionTreeRegressor 6.98 2.64 2.06 0.77
3 LinearRegression 7.63 2.76 2.11 0.75
4 SVR 21.51 4.64 3.63 0.31
reg_trainer.fitted_models_dict
{'LinearRegression': LinearRegression(),
 'DecisionTreeRegressor': DecisionTreeRegressor(),
 'RandomForestRegressor': RandomForestRegressor(),
 'SVR': SVR(),
 'GradientBoostingRegressor': GradientBoostingRegressor()}

Change Log

1.0.6 (06/13/2023)

  • Added regression
  • Modified docstrings
  • Added pre-defined function
  • Fixed local issues
  • Added training feature for training multiple models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SmplML-1.0.6.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

SmplML-1.0.6-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file SmplML-1.0.6.tar.gz.

File metadata

  • Download URL: SmplML-1.0.6.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.0

File hashes

Hashes for SmplML-1.0.6.tar.gz
Algorithm Hash digest
SHA256 c8a8e5909026cce70e7680aaacf81cbc270c0f3e576b344fbe114e49904d5ef7
MD5 9c51f2393997b2d016b46288bbd92a0e
BLAKE2b-256 42e6c5a3a776288194371bd91d09b794eb05e8dc012e575daca6c8a45241950c

See more details on using hashes here.

File details

Details for the file SmplML-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: SmplML-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.0

File hashes

Hashes for SmplML-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e89b005fd2511f511d2c56a14298e29b0d78294f32d852c6f0935c5fbe92bf16
MD5 76f66558f702360050315ecaf1cf8a6c
BLAKE2b-256 a4367df0bbf240cb6fcecc9be142deed48205797da2817206e08b75f7ffaf005

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page