Skip to main content

Machine Learning Python Library

Project description

BuildTest PythonVersion PyPi_version Downloads License

A Simple Yet Powerful Machine Learning Python Library

Install

pip install machlearn

Example 1: k-Nearest Neighbors

from machlearn import kNN
kNN.demo("iris")

Selected Output:

This demo uses a public dataset of Fisher's Iris, which has a total of 150 samples from three species of Iris ('setosa', 'versicolor', 'virginica').
The goal is to use 'the length and the width of the sepals and petals, in centimeters', to predict which species of Iris the sample belongs to.

Using a grid search and a kNN classifier, the best hyperparameters were found as following:
   Step1: scaler: StandardScaler(with_mean=True, with_std=True);
   Step2: classifier: kNN_classifier(n_neighbors=12, weights='uniform', p=2.00, metric='minkowski').

image_dataset_iris image_kNN_iris_confusion_matrix


Example 2: Naive Bayes

from machlearn import naive_bayes as nb
nb.demo(dataset="SMS_spam")

Selected Output:

This demo uses a public dataset of SMS spam, which has a total of 5574 messages = 4827 ham (legitimate) and 747 spam.
The goal is to use 'term frequency in message' to predict whether the message is ham (class=0) or spam (class=1).

Using a grid search and a multinomial naive bayes classifier, the best hyperparameters were found as following:
   Step1: Tokenizing text: CountVectorizer(analyzer = <_lemmas>, ngram_range = (1, 1));
   Step2: Transforming from occurrences to frequency: TfidfTransformer(use_idf = True).

The top 2 terms with highest probability of a message being a spam (the classification is either spam or ham):
   "claim": 81.28%
   "prize": 80.24%
   "won": 76.29%

Application example:
   - Message: "URGENT! We are trying to contact U. Todays draw shows that you have won a 2000 prize GUARANTEED. Call 090 5809 4507 from a landline. Claim 3030. Valid 12hrs only."
   - Probability of spam (class=1): 95.85%
   - Classification: spam

image_SMS_spam_text_example image_naive_bayes_confusion_matrix

image_naive_bayes_ROC_curve

image_naive_bayes_PR_curve


Example 3: Decision Boundary Comparison (Classification with Two Features)

from machlearn import kNN
kNN.demo("Social_Network_Ads")

from machlearn import naive_bayes as nb
nb.demo("Social_Network_Ads")

from machlearn import SVM
SVM.demo("Social_Network_Ads")

from machlearn import decision_tree as DT
DT.demo("Social_Network_Ads", classifier_func = "DT")

from machlearn import logistic_regression as log_reg
log_reg.demo("Social_Network_Ads")

from machlearn import neural_network as NN
NN.demo("Social_Network_Ads")

from machlearn import ensemble
ensemble.demo("Social_Network_Ads")

image_kNN_decision_boundary_testing_set

image_Gaussian_NB_decision_boundary_testing_set

image_SVM_decision_boundary_testing_set

image_DT_decision_boundary_testing_set

image_logistic_regression_decision_boundary_testing_set

image_NN_MLP_decision_boundary_testing_set

image_RFC_decision_boundary_testing_set

image_GBM_decision_boundary_testing_set


Example 4: Imbalanced Data

from machlearn import imbalanced_data
imbalanced_data.demo()

Summary of output:

To mitigate the problem associated with class imbalance, downsampling the majority class (y=0) to match the minority case (y=1).

These are insensitive to class imbalance:
- Area Under ROC curve
- Geometric mean
- Matthew's Correlation Coefficient
- Recall, TPR
- Specificity, 1-FPR

These are sensitive to class imbalance:
- Area Under PR curve
- Accuracy
- F1 score
- Precision

Extreme Imbalanced Data

Majority Downsampled to Match Minority Class

image_extreme_imbalanced_data_bar_chart

image_balanced_data_bar_chart

image_extreme_imbalanced_data_confusion_matrix

image_balanced_data_confusion_matrix

image_extreme_imbalanced_data_ROC_curve

image_balanced_data_ROC_curve

image_extreme_imbalanced_data_PR_curve

image_balanced_data_PR_curve


Example 5: Regularization

from machlearn import linear_regression as linreg
linreg.demo_regularization()

Summary of output:

Issues: (a) high multicollinearity and (b) too many features; these lead to overfitting and poor generalization.
- After L2 Regularization (Ridge regression), reduced variance among the coefficient estimates [more robust/stable estimates], and better R-squared and lower RMSE with the testing set [better generalization]
- After L1 Regularization (Lasso regression), coefficient estimates becoming 0 for relatively trivial features [a simpler model], and better R-squared and lower RMSE with the testing set [better generalization]

Example 6: Gradient Descent

from machlearn import gradient_descent as GD
GD.demo("Gender")

Summary of output:

This example uses a batch gradient descent (BGD) procedure, a cost function of logistic regression and a learning rate of 0.00025, with Male (1, 0) as the target.
- Theta estimates of [const, Height (inch), Weight (lbs)]: [0.69254314, -0.49262002, 0.19834042]
- Accuracy of prediction:  0.913

Descriptive statistics

Batch Gradient Descent Training Loss vs. Epoch

image_Gender_pairplot

image_Gender_batch_gradient_descent_training_loss_plot


Example 7: Decision Tree

from machlearn import decision_tree as DT
DT.demo()
DT.demo_from_scratch(question_type="regression") # dataset='boston'
DT.demo_from_scratch(question_type="classification") # dataset='Social_Network_Ads', X=not scaled, criterion=entropy, max_depth=2

Summary of output:

- DT.demo_from_scratch(question_type="regression") uses decision_tree_regressor_from_scratch()
- DT.demo_from_scratch(question_type="classification") provides results essentially identical to the tree graph below.

image_Social_Networks_Ad_DT_notscaled_entropy_maxdepth=2


Example 8: Ensemble Methods

from machlearn import ensemble
ensemble.demo()
ensemble.demo("Social_Network_Ads")
ensemble.demo("boston")

Summary of output:

- These demos call the following functions developed from scratch and reflect the inner workings of them:
* random_forest_classifier_from_scratch();
* adaptive_boosting_classifier_from_scratch();
* gradient_boosting_regressor_from_scratch() (see training history plot below): R_squared = 0.753, RMSE = 4.419

image_boston_GBM_loss_history_plot


module: model_evaluation

function

description

plot_ROC_and_PR_curves()

plots both the ROC and the precision-recall curves, along with statistics

plot_ROC_curve()

plots the ROC (Receiver Operating Characteristic) curve, along with statistics

plot_PR_curve()

plots the precision-recall curve, along with statistics

plot_confusion_matrix()

plots the confusion matrix, along with key statistics, and returns accuracy

demo_CV()

provides a demo of cross validation in this module

demo()

provides a demo of the major functions in this module


module: datasets

function

description

public_dataset()

returns a public dataset as specified (e.g., iris, SMS_spam, Social_Network_Ads)


module: kNN

function

description

demo()

provides a demo of selected functions in this module


module: naive_bayes

class/function

description

naive_bayes_Gaussian()

when X are continuous variables

naive_bayes_multinomial()

when X are independent discrete variables with 3+ levels (e.g., term frequency in the document)

naive_bayes_Bernoulli()

when X are independent binary variables (e.g., whether a word occurs in a document or not)

demo()

provides a demo of selected functions in this module


module: SVM

function

description

demo()

provides a demo of selected functions in this module


module: decision_tree

class/function

description

decision_tree_regressor_from_scratch()

decision tree regressor developed from scratch

decision_tree_classifier_from_scratch()

decision tree classifier developed from scratch

demo_from_scratch()

provides a demo of selected functions in this module

decision_tree_regressor()

decision tree regressor

decision_tree_classifier()

decision tree classifier

demo()

provides a demo of selected functions in this module


module: neural_network

function

description

multi_layer_perceptron_classifier()

multi-layer perceptron (MLP) classifier

rnn()

recurrent neural network

demo()

provides a demo of selected functions in this module


module: logistic_regression

function

description

LogisticReg_sklearn()

solutions using sklearn

LogisticReg_statsmodels()

solutions using statsmodels

demo()

provides a demo of selected functions in this module


module: linear_regression

function

description

Lasso_regression()

lasso_regression

Ridge_regression()

ridge_regression

demo_regularization()

provides a demo of selected functions in this module

Linear_regression_normal_equation()

linear_regression_normal_equation

Linear_regression()

linear_regression

demo()

provides a demo of selected functions in this module


module: DSA

function

description

demo()

provides a demo of selected functions in this module


module: imbalanced_data

function

description

demo()

provides a demo of selected functions in this module


module: decomposition

function

description

demo()

provides a demo of selected functions in this module


module: gradient_descent

class/function

description

logistic_regression_BGD_classifier()

logistic_regression_BGD_classifier class

batch_gradient_descent()

batch_gradient_descent class

demo()

provides a demo of selected functions in this module


module: ensemble

class/function

description

gradient_boosting_regressor_from_scratch()

gradient boosting regressor developed from scratch

adaptive_boosting_classifier_from_scratch()

adaptive boosting classifier developed from scratch

random_forest_classifier_from_scratch()

random forest classifier developed from scratch

bagging_classifier_from_scratch()

bagging classifier developed from scratch

gradient_boosting_classifier()

gradient boosting classifier

adaptive_boosting_classifier()

adaptive boosting classifier

random_forest_classifier()

random forest classifier

bagging_classifier()

bagging classifier

voting_classifier()

voting classifier

demo()

provides a demo of selected functions in this module

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

machlearn-1.2.14.tar.gz (71.4 MB view hashes)

Uploaded Source

Built Distribution

machlearn-1.2.14-py3-none-any.whl (71.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page