Training time estimator for scikit-learn algorithms

These details have not been verified by PyPI

Project links

Homepage

Project description

scitime

Training time estimation for scikit-learn algorithms. Method explained in this article

Currently supporting:

RandomForestRegressor
SVC
KMeans
RandomForestClassifier

Environment setup

Python version: 3.7

Package dependencies:

scikit-learn (~=0.24.1)
pandas (~=1.1.5)
joblib (~=1.0.1)
psutil (~=5.8.0)
scipy (~=1.5.4)

Install scitime

❱ pip install scitime
or 
❱ conda install -c conda-forge scitime

Usage

How to compute a runtime estimation

Example for RandomForestRegressor

from sklearn.ensemble import RandomForestRegressor
import numpy as np
import time

from scitime import Estimator

# example for rf regressor
estimator = Estimator(meta_algo='RF', verbose=3)
rf = RandomForestRegressor()

X,y = np.random.rand(100000,10),np.random.rand(100000,1)
# run the estimation
estimation, lower_bound, upper_bound = estimator.time(rf, X, y)

# compare to the actual training time
start_time = time.time()
rf.fit(X,y)
elapsed_time = time.time() - start_time
print("elapsed time: {:.2}".format(elapsed_time))

Example for KMeans

from sklearn.cluster import KMeans
import numpy as np
import time

from scitime import Estimator

# example for kmeans clustering
estimator = Estimator(meta_algo='RF', verbose=3)
km = KMeans()

X = np.random.rand(100000,10)
# run the estimation
estimation, lower_bound, upper_bound = estimator.time(km, X)

# compare to the actual training time
start_time = time.time()
km.fit(X)
elapsed_time = time.time() - start_time
print("elapsed time: {:.2}".format(elapsed_time))

The Estimator class arguments:

meta_algo: The estimator used to predict the time, either RF or NN
verbose: Controls the amount of log output (either 0, 1, 2 or 3)
confidence: Confidence for intervals (defaults to 95%)

Parameters of the estimator.time function:

X: np.array of inputs to be trained
y: np.array of outputs to be trained (set to None for unsupervised algo)
algo: algo whose runtime the user wants to predict

--- FOR TESTERS / CONTRIBUTORS ---

Local Testing

Inside virtualenv (with pytest>=3.2.1):

(env)$ python -m pytest

How to use _data.py to generate data / fit models?

$ python _data.py --help

usage: _data.py [-h] [--drop_rate DROP_RATE] [--meta_algo {RF,NN}]
                [--verbose VERBOSE]
                [--algo {RandomForestRegressor,RandomForestClassifier,SVC,KMeans}]
                [--generate_data] [--fit FIT] [--save]

Gather & Persist Data of model training runtimes

optional arguments:
  -h, --help            show this help message and exit
  --drop_rate DROP_RATE
                        drop rate of number of data generated (from all param
                        combinations taken from _config.json). Default is
                        0.999
  --meta_algo {RF,NN}   meta algo used to fit the meta model (NN or RF) -
                        default is RF
  --verbose VERBOSE     verbose mode (0, 1, 2 or 3)
  --algo {RandomForestRegressor,RandomForestClassifier,SVC,KMeans}
                        algo to train data on
  --generate_data       do you want to generate & write data in a dedicated
                        csv?
  --fit FIT             do you want to fit the model? If so indicate the csv
                        name
  --save                (only used for model fit) do you want to save /
                        overwrite the meta model from this fit?

(_data.py uses _model.py behind the scenes)

How to run _model.py?

After pulling the master branch (git pull origin master) and setting the environment (described above), run ipython and:

from scitime._model import Model

# example of data generation for rf regressor
trainer = Model(drop_rate=0.99999, verbose=3, algo='RandomForestRegressor')
inputs, outputs, _ = trainer._generate_data()

# then fitting the meta model
meta_algo = trainer.model_fit(generate_data=False, inputs=inputs, outputs=outputs)
# this should not locally overwrite the pickle file located at scitime/models/{your_model}
# if you want to save the model, set the argument save_model to True

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.1

Mar 27, 2021

This version

0.1.0

Mar 1, 2021

0.0.2

May 6, 2019

0.0.1

Feb 7, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitime-0.1.0.tar.gz (15.5 kB view details)

Uploaded Mar 1, 2021 Source

File details

Details for the file scitime-0.1.0.tar.gz.

File metadata

Download URL: scitime-0.1.0.tar.gz
Upload date: Mar 1, 2021
Size: 15.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.19.1 CPython/3.7.3

File hashes

Hashes for scitime-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e846f360cd185902cc7b81a4f88921ec522d8c891bf04df316d7878aa4e9e504`
MD5	`af9719137954190d4811b2c2f1d8139e`
BLAKE2b-256	`30f4c3639829316ab3819453dac4ae5dcc8770d544c35496c738415bdc7420e4`

See more details on using hashes here.

scitime 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

scitime

Environment setup

Install scitime

Usage

How to compute a runtime estimation

--- FOR TESTERS / CONTRIBUTORS ---

Local Testing

How to use _data.py to generate data / fit models?

How to run _model.py?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes