Skip to main content

Machine Learning package for quick fast model generation and comparison

Project description

modelcreator - AutoML package

This package contains a Machine which is meant to do the learning for you. It can automaticly create a fitting predictive model for given data.

Sample output
Testing:  Gradient Boosting Classifier
[########################################] | 100% Completed |  3.9s
Score: 0.9667

Testing:  Ada Boost Classifier
[########################################] | 100% Completed |  1.3s
Score: 0.9600

Testing:  Random Forest Classifier
[########################################] | 100% Completed |  5.0s
Score: 0.9600

Testing:  Balanced Random Forest Classifier
[########################################] | 100% Completed |  3.5s
Score: 0.9600

Testing:  SVC
[########################################] | 100% Completed |  1.2s
Score: 0.9667

Chosen model:  Gradient Boosting Classifier 0.9667

Params:
        min_samples_split: 2
        n_estimators: 100

Results saved to  output.csv

Table of Contents

  1. Installation
  2. Usage
  3. Saving model
  4. Parameters
  5. Development

Installation

To use the package run:

pip install modelcreator

Usage

The input may be either a path to a csv file or a pandas DataFrame object.

CSV path input

The library assumes that the last column of the training dataset contains the expected results. The dataset (both training and predictive) must be provided as a csv file.

If the results column contains text the Machine will do its best to learn to classify the data correctly. In case of a number inside, regression will be performed.

If the file contains headers you shall add header_in_csv=True parameter to the method.

Example 1 Iris
from modelcreator import Machine

# Create automl machine instance
machine = Machine()

# Train machine learning model
machine.learn('example-data/iris.csv')

# Predict the outcomes
machine.predict('example-data/iris-pred.csv', 'output.csv')

This example is also available in the example.py file. Consider trying it on your own.

Pandas input

But what to do if a result column is not the last in the given csv? It may be inconvenient to rewrite the whole csv just to swap the columns. Because of this problem Machine has learnFromDf and predictFromDf methods. The Df in method names stands for DataFrame from pandas module. This way you can handle reading the file by yourself.

Example 2 Titanic
from modelcreator import Machine
import pandas as pd

# Create DataFrame object from file
train = pd.read_csv("train.csv")

# Get features columns from DataFrame
X_train = train.drop(['Survived'], axis=1)

# And labels (results) column
y_train = train["Survived"].astype(str)

# Create the instance of Machine
machine = Machine()

# Train machine learning model
machine.learnFromDf(X_train, y_train, computation_level='advanced')

# Show parameters of the model
machine.showParams()

# Load test set from file
X_test = pd.read_csv("test.csv")

# Predict the labels
results = machine.predictFromDf(X_test)

# Save results to a new file
results.to_csv("results.csv")

Simple? That's right! Just note that we used astype(str) in order to treat data as classes, not numbers because the Titanic dataset used in the example above has values 0 and 1 in "Survived" column to indicate whether a person made it through the disaster.

Saving the model

If you want your model to avoid re-learning on the whole dataset just to make a simple prediction you can save the state of Machine to a file.

# Save Machine with a trained model to "machine.pkl"
machine.saveMachine('machine.pkl')

# Create a new machine based on a schema file
machine2 = Machine('machine.pkl')

Parameters

The Machine can be customized according to the use case. Check the parameters table:

Machine
Param Type Default Description
schema None or str None A Machine may be created based on a saved, pre-trained machine instance. You may specify the path to the saved instance in this param to recreate it.
learn
Param Type Default Description
dataset_file str Path to a csv file which contains training dataset.
header_in_csv bool False Whether the csv file contains headers in the first row.
metrics None, str or Callable 'accuracy' or 'neg_root_mean_squared_error' Metrics used for scoring estimators. Many popular scoring functions (such as f1, roc_auc, neg_mean_gamma_deviance). See here how to make custom scoring functions.
verbose bool True Whether to print learning logs.
cv int 3 a Number of cross-validation subsets. Higher values may increase computation time.
computation_level str 'medium' Can be either 'basic', 'medium' or 'advanced'. With higher computation level more models and parameters are being tested.
learnFromDf
Param Type Default Description
X pandas.DataFrame DataFrame containing the feature columns.
y pandas.Series Label columns of the training data.
metrics None, str or Callable 'accuracy' or 'neg_root_mean_squared_error' Metrics used for scoring estimators. Many popular scoring functions (such as f1, roc_auc, neg_mean_gamma_deviance). See here how to make custom scoring functions.
verbose bool True Whether to print learning logs.
cv int 3 A number of cross-validation subsets. Higher values may increase computation time.
computation_level str 'medium' Can be either 'basic', 'medium' or 'advanced'. With higher computation level more models and parameters are being tested.
predict
Param Type Default Description
features_file str Path to the features csv of the data to generate predictions on.
header_in_csv bool False Whether the csv file contains headers in the first row.
output_file str 'output.csv' Path to the output csv file. In this file, the predictions will be saved.
verbose str True Whether to print logs.
predictFromDf
Param Type Default Description
X_predictions pandas.DataFrame Features columns to generate predictions on.
output_file str None Predict method returns pandas.Series of the results. Additionally, it can also save the results to a csv file. It can be specified here. If the path is other than None it will be interpreted as a path to the output file.
verbose str True Whether to print logs.
saveMachine
Param Type Default Description
output_file_name str 'machine.pkl' Path to where shall the Machine instance be saved.

Development

Have a feature idea or just want to help? Take a look at the issues tab!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelcreator-0.9.3.tar.gz (11.0 kB view details)

Uploaded Source

Built Distributions

modelcreator-0.9.3-py3.8.egg (32.9 kB view details)

Uploaded Source

modelcreator-0.9.3-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file modelcreator-0.9.3.tar.gz.

File metadata

  • Download URL: modelcreator-0.9.3.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.9.0

File hashes

Hashes for modelcreator-0.9.3.tar.gz
Algorithm Hash digest
SHA256 3b610eb7ed96f1dbeb9c98e156f42f9579547305388bc6dcb938efdef48bd367
MD5 52e8a300b5be3b908304986cd3dc6e54
BLAKE2b-256 84ecf1d870e2c86b35367627fb01f834a46f9e2aa2d071bc62463b2702911cb3

See more details on using hashes here.

File details

Details for the file modelcreator-0.9.3-py3.8.egg.

File metadata

  • Download URL: modelcreator-0.9.3-py3.8.egg
  • Upload date:
  • Size: 32.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.9.0

File hashes

Hashes for modelcreator-0.9.3-py3.8.egg
Algorithm Hash digest
SHA256 34bdf8ea9cae9952264a6db226940bd128422853738c88bc2a111e9f39ef8023
MD5 7a3148f59db229c130b47a0ebf8dbfaa
BLAKE2b-256 3ebb5f8776f39bb0ce1bb455ccb4aabf3810166aa20a7b217adbd1cb49c814bd

See more details on using hashes here.

File details

Details for the file modelcreator-0.9.3-py3-none-any.whl.

File metadata

  • Download URL: modelcreator-0.9.3-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.9.0

File hashes

Hashes for modelcreator-0.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 89909e4187ba04c62cfd11d3ae47af63774b00a17850bd9091d01b61d7360b25
MD5 69ab55f36c3c00b93153efaf078639ab
BLAKE2b-256 3a0d3446dc0aea4f9e9dcde8c432790a27722aff98000d19943e49c5e12c2fcb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page