a machine learning tool that allows you to train, test and use models without writing code
Project description
igel
A machine learning tool that allows you to train/fit, test and use models without writing code
Free software: MIT license
Documentation: https://igel.readthedocs.io.
Motivation & Goal
The goal of the project is to provide machine learning for everyone, both technical and non technical users.
I needed a tool sometimes, which I can use to fast create a machine learning prototype. Whether to build some proof of concept or create a fast draft model to prove a point. I find myself often stuck at writing boilerplate code and/or thinking too much of how to start this.
Therefore, I decided to create igel. Hopefully, it will make it easier for technical and non technical users to build machine learning models.
Intro
igel is built on top of scikit-learn. It provides a simple way to use machine learning without writing a single line of code
All you need is a yaml file, where you need to describe what you are trying to do. That’s it!
Installation
The easiest way is to install igel using pip
$ pip install igel
Check the docs for other ways to install igel from source
Overview
The main goal of igel is to provide you with a way to train/fit, evaluate and use models without writing code. Instead, all you need is to provide/describe what you want to do in a simple yaml file.
Basically, you provide description or rather configurations in the yaml file as key value pairs. Here is an overview of all supported configurations (for now):
# dataset operations
dataset:
split: # split options
test_size: 0.2 # 0.2 means 20% for the test data, so 80% are automatically for training
shuffle: True # whether to shuffle the data before/while splitting
stratify: None # If not None, data is split in a stratified fashion, using this as the class labels.
preprocess: # preprocessing options
missing_values: mean # other possible values: [drop, median, most_frequent, constant] check the docs for more
encoding:
type: oneHotEncoding # other possible values: [labelEncoding]
scale: # scaling options
method: standard # standardization will scale values to have a 0 mean and 1 standard deviation | you can also try minmax
target: inputs # scale inputs. | other possible values: [outputs, all] # if you choose all then all values in the dataset will be scaled
# model definition
model:
type: classification # type of the problem you want to solve. | possible values: [regression, classification]
algorithm: random forest # which algorithm you want to use. | type igel algorithms in the Terminal to know more
# target you want to predict
target:
- put the target you want to predict here
Quick Start
First step is to provide a yaml file:
# model definition
model:
# in the type field, you can write the type of problem you want to solve. Whether regression or classification
# Then, provide the algorithm you want to use on the data. Here I'm using the random forest algorithm
type: classification
algorithm: random forest
# target you want to predict
# Here, as an example, I'm using the famous indians-diabetes dataset, where I want to predict whether someone have diabetes or not.
# Depending on your data, you need to provide the target(s) you want to predict here
target:
- sick
In the example above, I’m using random forest to classify whether someone have diabetes or not depending on some features in the dataset I used this indian-diabetes dataset )
` - Run this command in Terminal, where you provide the path to your dataset and the path to the yaml file
$ igel fit --data_path 'path_to_your_csv_dataset.csv' --yaml_file 'path_to_your_yaml_file.yaml'
# or shorter
$ igel fit -dp 'path_to_your_csv_dataset.csv' -yml 'path_to_your_yaml_file.yaml'
you can run this command to get instruction on how to use the model:
$ igel --help
# or just
$ igel -h
That’s it. Your “trained” model can be now found in the model_results folder (automatically created for you in your current working directory). Furthermore, a description can be found in the description.json file inside the model_results folder.
E2E Example
A complete end to end solution is provided in this section to prove the capabilities of igel. As explained previously, you need to create a yaml configuration file. Here is an end to end example for predicting whether someone have diabetes or not using the decision tree algorithm. The dataset can be found in the examples folder.
Fit/Train a model:
model:
type: classification
algorithm: decision tree
target:
- sick
$ igel fit -dp path_to_the_dataset -yml path_to_the_yaml_file
That’s it, igel will now fit the model for you and save it in a model_results folder in your current directory.
Evaluate the model:
Evaluate the pre-fitted model. Igel will load the pre-fitted model from the model_results directory and evaluate it for you. You just need to run the evaluate command and provide the path to your evaluation data.
$ igel evaluate -dp path_to_the_evaluation_dataset
That’s it! Igel will evaluate the model and store statistics/results in an evaluation.json file inside the model_results folder
Predict:
Use the pre-fitted model to predict on new data. This is done automatically by igel, you just need to provide the path to your data that you want to use prediction on.
$ igel predict -dp path_to_the_new_dataset
That’s it! Igel will use the pre-fitted model to make predictions and save it in a predictions.csv file inside the model_results folder
Advanced Usage
You can also carry out some preprocessing methods or other operations by providing the it in the yaml file. Here is an example, where the data is split to 80% for training and 20% for validation/testing. Also, the data are shuffled while splitting.
Furthermore, the data are preprocessed by replacing missing values with the mean ( you can also use median, mode etc..). check this link for more information
# dataset operations
dataset:
split:
test_size: 0.2
shuffle: True
stratify: None
preprocess:
missing_values: mean
# model definition
model:
type: classification
algorithm: random forest
# target you want to predict
target:
- sick
Then, you can fit the model by running the igel command as shown in the other examples
$ igel fit -dp path_to_the_dataset -yml path_to_the_yaml_file
For evaluation
$ igel evaluate -dp path_to_the_evaluation_dataset
For production
$ igel predict -dp path_to_the_new_dataset
Examples
Check the examples folder, where you will find the indian-diabetes data and a yaml file example. You can clone the repo or just download the dataset then try running the example afterwards.
Contributions
Contributions are always welcome. Make sure you read the guidelines first
History
0.1.5 (2020-09-10)
implemented encoding and scaling methods
0.1.4 (2020-09-08)
support for all sklearn models
0.1.3 (2020-09-07)
implemented basic dataset operations
0.0.1 (2020-09-05)
stable release with an end to end pipeline
0.0.6 (2020-09-01)
Added validation on arguments and provided an example
0.0.5 (2020-08-31)
Added logging and changed file keyword to yaml_file
0.0.3 (2020-08-30)
First functional package
0.0.1 (2020-08-27)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.