Machine Learning Python Library
Project description
A Simple Yet Powerful Machine Learning Python Library
Install
pip install machlearn
Example 1: k-Nearest Neighbors
from machlearn import kNN
kNN.demo("iris")
Selected Output:
This demo uses a public dataset of Fisher's Iris, which has a total of 150 samples from three species of Iris ('setosa', 'versicolor', 'virginica').
The goal is to use 'the length and the width of the sepals and petals, in centimeters', to predict which species of Iris the sample belongs to.
Using a grid search and a kNN classifier, the best hyperparameters were found as following:
Step1: scaler: StandardScaler(with_mean=True, with_std=True);
Step2: classifier: kNN_classifier(n_neighbors=12, weights='uniform', p=2.00, metric='minkowski').
Example 2: Naive Bayes
from machlearn import naive_bayes as nb
nb.demo(dataset="SMS_spam")
Selected Output:
This demo uses a public dataset of SMS spam, which has a total of 5574 messages = 4827 ham (legitimate) and 747 spam.
The goal is to use 'term frequency in message' to predict whether the message is ham (class=0) or spam (class=1).
Using a grid search and a multinomial naive bayes classifier, the best hyperparameters were found as following:
Step1: Tokenizing text: CountVectorizer(analyzer = <_lemmas>, ngram_range = (1, 1));
Step2: Transforming from occurrences to frequency: TfidfTransformer(use_idf = True).
The top 2 terms with highest probability of a message being a spam (the classification is either spam or ham):
"claim": 81.28%
"prize": 80.24%
"won": 76.29%
Application example:
- Message: "URGENT! We are trying to contact U. Todays draw shows that you have won a 2000 prize GUARANTEED. Call 090 5809 4507 from a landline. Claim 3030. Valid 12hrs only."
- Probability of spam (class=1): 95.85%
- Classification: spam
Example 3: Decision Boundary Comparison (Classification with Two Features)
from machlearn import kNN
kNN.demo("Social_Network_Ads")
from machlearn import naive_bayes as nb
nb.demo("Social_Network_Ads")
from machlearn import SVM
SVM.demo("Social_Network_Ads")
from machlearn import decision_tree as DT
DT.demo("Social_Network_Ads", classifier_func = "DT")
from machlearn import logistic_regression as log_reg
log_reg.demo("Social_Network_Ads")
from machlearn import neural_network as NN
NN.demo("Social_Network_Ads")
from machlearn import ensemble
ensemble.demo("Social_Network_Ads")
Example 4: Imbalanced Data
from machlearn import imbalanced_data
imbalanced_data.demo()
Summary of output:
To mitigate the problem associated with class imbalance, downsampling the majority class (y=0) to match the minority case (y=1).
These are insensitive to class imbalance:
- Area Under ROC curve
- Geometric mean
- Matthew's Correlation Coefficient
- Recall, TPR
- Specificity, 1-FPR
These are sensitive to class imbalance:
- Area Under PR curve
- Accuracy
- F1 score
- Precision
Extreme Imbalanced Data |
Majority Downsampled to Match Minority Class |
---|---|
Example 5: Regularization
from machlearn import linear_regression as linreg
linreg.demo_regularization()
Summary of output:
Issues: (a) high multicollinearity and (b) too many features; these lead to overfitting and poor generalization.
- After L2 Regularization (Ridge regression), reduced variance among the coefficient estimates [more robust/stable estimates], and better R-squared and lower RMSE with the testing set [better generalization]
- After L1 Regularization (Lasso regression), coefficient estimates becoming 0 for relatively trivial features [a simpler model], and better R-squared and lower RMSE with the testing set [better generalization]
Example 6: Gradient Descent
from machlearn import gradient_descent as GD
GD.demo("Gender")
Summary of output:
This example uses a batch gradient descent (BGD) procedure, a cost function of logistic regression and a learning rate of 0.00025, with Male (1, 0) as the target.
- Theta estimates of [const, Height (inch), Weight (lbs)]: [0.69254314, -0.49262002, 0.19834042]
- Accuracy of prediction: 0.913
Descriptive statistics |
Batch Gradient Descent Training Loss vs. Epoch |
---|---|
Example 7: Decision Tree
from machlearn import decision_tree as DT
DT.demo()
DT.demo_from_scratch(question_type="regression") # dataset='boston'
DT.demo_from_scratch(question_type="classification") # dataset='Social_Network_Ads', X=not scaled, criterion=entropy, max_depth=2
Summary of output:
- DT.demo_from_scratch(question_type="regression") uses decision_tree_regressor_from_scratch()
- DT.demo_from_scratch(question_type="classification") provides results essentially identical to the tree graph below.
Example 8: Ensemble Methods
from machlearn import ensemble
ensemble.demo()
ensemble.demo("Social_Network_Ads")
ensemble.demo("boston")
Summary of output:
- These demos call the following functions developed from scratch and reflect the inner workings of them:
* random_forest_classifier_from_scratch();
* adaptive_boosting_classifier_from_scratch();
* gradient_boosting_regressor_from_scratch() (see training history plot below): R_squared = 0.753, RMSE = 4.419
module: model_evaluation
function |
description |
---|---|
plot_ROC_and_PR_curves() |
plots both the ROC and the precision-recall curves, along with statistics |
plot_ROC_curve() |
plots the ROC (Receiver Operating Characteristic) curve, along with statistics |
plot_PR_curve() |
plots the precision-recall curve, along with statistics |
plot_confusion_matrix() |
plots the confusion matrix, along with key statistics, and returns accuracy |
demo_CV() |
provides a demo of cross validation in this module |
demo() |
provides a demo of the major functions in this module |
module: datasets
function |
description |
---|---|
public_dataset() |
returns a public dataset as specified (e.g., iris, SMS_spam, Social_Network_Ads) |
module: kNN
function |
description |
---|---|
demo() |
provides a demo of selected functions in this module |
module: naive_bayes
class/function |
description |
---|---|
naive_bayes_Gaussian() |
when X are continuous variables |
naive_bayes_multinomial() |
when X are independent discrete variables with 3+ levels (e.g., term frequency in the document) |
naive_bayes_Bernoulli() |
when X are independent binary variables (e.g., whether a word occurs in a document or not) |
demo() |
provides a demo of selected functions in this module |
module: SVM
function |
description |
---|---|
demo() |
provides a demo of selected functions in this module |
module: decision_tree
class/function |
description |
---|---|
decision_tree_regressor_from_scratch() |
decision tree regressor developed from scratch |
decision_tree_classifier_from_scratch() |
decision tree classifier developed from scratch |
demo_from_scratch() |
provides a demo of selected functions in this module |
decision_tree_regressor() |
decision tree regressor |
decision_tree_classifier() |
decision tree classifier |
demo() |
provides a demo of selected functions in this module |
module: neural_network
function |
description |
---|---|
multi_layer_perceptron_classifier() |
multi-layer perceptron (MLP) classifier |
rnn() |
recurrent neural network |
demo() |
provides a demo of selected functions in this module |
module: logistic_regression
function |
description |
---|---|
LogisticReg_sklearn() |
solutions using sklearn |
LogisticReg_statsmodels() |
solutions using statsmodels |
demo() |
provides a demo of selected functions in this module |
module: linear_regression
function |
description |
---|---|
Lasso_regression() |
lasso_regression |
Ridge_regression() |
ridge_regression |
demo_regularization() |
provides a demo of selected functions in this module |
Linear_regression_normal_equation() |
linear_regression_normal_equation |
Linear_regression() |
linear_regression |
demo() |
provides a demo of selected functions in this module |
module: DSA
function |
description |
---|---|
demo() |
provides a demo of selected functions in this module |
module: imbalanced_data
function |
description |
---|---|
demo() |
provides a demo of selected functions in this module |
module: decomposition
function |
description |
---|---|
demo() |
provides a demo of selected functions in this module |
module: gradient_descent
class/function |
description |
---|---|
logistic_regression_BGD_classifier() |
logistic_regression_BGD_classifier class |
batch_gradient_descent() |
batch_gradient_descent class |
demo() |
provides a demo of selected functions in this module |
module: ensemble
class/function |
description |
---|---|
gradient_boosting_regressor_from_scratch() |
gradient boosting regressor developed from scratch |
adaptive_boosting_classifier_from_scratch() |
adaptive boosting classifier developed from scratch |
random_forest_classifier_from_scratch() |
random forest classifier developed from scratch |
bagging_classifier_from_scratch() |
bagging classifier developed from scratch |
gradient_boosting_classifier() |
gradient boosting classifier |
adaptive_boosting_classifier() |
adaptive boosting classifier |
random_forest_classifier() |
random forest classifier |
bagging_classifier() |
bagging classifier |
voting_classifier() |
voting classifier |
demo() |
provides a demo of selected functions in this module |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file machlearn-1.2.14.tar.gz
.
File metadata
- Download URL: machlearn-1.2.14.tar.gz
- Upload date:
- Size: 71.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 206d762490da1da5f5821daae29f8bdb2459e7e06a3ab2908db45440e1363242 |
|
MD5 | 734afa9028cbe8c7dc5a2a5eaed5f312 |
|
BLAKE2b-256 | f2845b8bb55036fab79a971fcd62f1ee9897e49f003fd6ac2b6c1cf3027ec5bb |
File details
Details for the file machlearn-1.2.14-py3-none-any.whl
.
File metadata
- Download URL: machlearn-1.2.14-py3-none-any.whl
- Upload date:
- Size: 71.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6894d5248b99e0805d978e0fabc39def13c892fd38c7d5cb3113cb4d2c9c2710 |
|
MD5 | 19eb2ef2f0bfe132790e4f9a98791a76 |
|
BLAKE2b-256 | 4297663668f07660cbbb1430513c993da2034dfaa20fff187964adf4f031ea32 |