Soldai utilities for machine learning and text processing
Project description
sutil
This repository contains a set of tools to deal with machine learning and natural language processing tasks, including classes to make quick experimentation of different classifacation models.
Dataset
This class is made to load csv styles dataset's where all the features are comma separeted and the class is in the last column. It includes functions to normalize the features, add bias, save the data to a file and load from it. Also includes functions to split the train, validation and test datasets.
from sutil.base.Dataset import Dataset datafile = './sutil/datasets/ex2data1.txt' d = Dataset.fromDataFile(datafile, ',') print(d.size) sample = d.sample(0.3) print(sample.size) sample.save("modelo_01") train, validation, test = d.split(train = 0.8, validation = 0.2) print(train.size) print(validation.size) print(test.size)
Regularized Logistic Regression
You can also include your own models as a Regularized Logistic Regression, implemented manually using numpy and included in the sutil.models package
import numpy as np from sutil.base.Dataset import Dataset from sutil.models.RegularizedLogisticRegression import RegularizedLogisticRegression datafile = './sutil/datasets/ex2data1.txt' d = Dataset.fromDataFile(datafile, ',') d.xlabel = 'Exam 1 score' d.ylabel = 'Exam 2 score' d.legend = ['Admitted', 'Not admitted'] iterations = 400 print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.\n') d.plotData() theta = np.zeros((d.n + 1, 1)) lr = RegularizedLogisticRegression(theta, 0.03, 0, train=1) lr.trainModel(d) lr.score(d.X, d.y) lr.roc.plot() lr.roc.zoom((0, 0.4),(0.5, 1.0))
Sklearn model
You can also embed the sklearn models in a wrapper class in order to run experiments with diferent models implemented in sklearn. In the same style you can create tensorflow, keras or pytorch models inhyereting from sutil.modes.Model class and implementing the trainModel and predict methods.
import numpy as np from sutil.base.Dataset import Dataset from sutil.models.SklearnModel import SklearnModel from sklearn.linear_model import LogisticRegression datafile = './sutil/datasets/ex2data1.txt' d = Dataset.fromDataFile(datafile, ',') ms = LogisticRegression() m = SklearnModel('Sklearn Logistic', ms) m.trainModel(d) m.score(d.X, d.y) m.roc.plot() m.roc.zoom((0, 0.4),(0.5, 1.0))
Neural Network Classifer
This class let's you perform classifcation using a Neural Network, multiperceptron classifer. It wraps the sklearn MLPClassifer and implements a method to search different activations, solvers and hidden layers structures. Upu can pass your own arguments to initialize the network as you want.
from sutil.base.Dataset import Dataset from sutil.neuralnet.NeuralNetworkClassifier import NeuralNetworkClassifier datafile = './sutil/datasets/ex2data1.txt' d = Dataset.fromDataFile(datafile, ',') d.normalizeFeatures() sample = d.sample(examples = 30) nn = NeuralNetworkClassifier((d.n, len(d.labels))) nn.searchParameters(sample) nn.trainModel(d) nn.score(d.X, d.y) nn.roc.plot()
Experiment
The experiment class let's you perform the data split and test against different models to compare the performance automatically
import numpy as np from sutil.base.Dataset import Dataset from sklearn.linear_model import LogisticRegression from sutil.base.Experiment import Experiment from sutil.models.SklearnModel import SklearnModel from sutil.models.RegularizedLogisticRegression import RegularizedLogisticRegression from sutil.neuralnet.NeuralNetworkClassifier import NeuralNetworkClassifier # Load the data datafile = './sutil/datasets/ex2data1.txt' d = Dataset.fromDataFile(datafile, ',') d.normalizeFeatures() print("Size of the dataset... ") print(d.size) sample = d.sample(0.3) print("Size of the sample... ") print(d.sample) # Create the models theta = np.zeros((d.n + 1, 1)) lr = RegularizedLogisticRegression(theta, 0.03, 0) m = SklearnModel('Sklearn Logistic', LogisticRegression()) # Look for the best parameters using a sample nn = NeuralNetworkClassifier((d.n, len(d.labels))) nn.searchParameters(sample) input("Press enter to continue...") # Create the experiment experiment = Experiment(d, None, 0.8, 0.2) experiment.addModel(lr, name = 'Sutil Logistic Regression') experiment.addModel(m, name = 'Sklearn Logistic Regression') experiment.addModel(nn, name = 'Sutil Neural Network') # Run the experiment experiment.run(plot = True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for soldai-utils-maxsob86-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec2aa5e52cb71dc46396e68ded6f3f31be9f216739f193376ca4f4a2a2f18d40 |
|
MD5 | 468f9b6bee4ea69737e8aa16babcd22e |
|
BLAKE2-256 | ee4e671736dcf933320ffef3d939f36078aaecf8c28e5007ceca41266b94df4d |
Hashes for soldai_utils_maxsob86-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccc5b5ad7925503acf37f5f12d0942cad84e5fda2c61606f5435785827d65674 |
|
MD5 | b8598705d096ceba671f3281aeaa3b8a |
|
BLAKE2-256 | c2cc098d040a5369c4814de91f9b9094c81cb3cce591e6e6dc1efc6a93ea1c7e |