Training time estimator for scikit-learn algorithms
Project description
scitime
Training time estimation for scikit-learn algorithms. Method explained in this article
Currently supporting:
- RandomForestRegressor
- SVC
- KMeans
- RandomForestClassifier
Environment setup
Python version: 3.7
Package dependencies:
- scikit-learn (~=0.24.1)
- pandas (~=1.1.5)
- joblib (~=1.0.1)
- psutil (~=5.8.0)
- scipy (~=1.5.4)
Install scitime
❱ pip install scitime
or
❱ conda install -c conda-forge scitime
Usage
How to compute a runtime estimation
- Example for RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor
import numpy as np
import time
from scitime import RuntimeEstimator
# example for rf regressor
estimator = RuntimeEstimator(meta_algo='RF', verbose=3)
rf = RandomForestRegressor()
X,y = np.random.rand(100000,10),np.random.rand(100000,1)
# run the estimation
estimation, lower_bound, upper_bound = estimator.time(rf, X, y)
# compare to the actual training time
start_time = time.time()
rf.fit(X,y)
elapsed_time = time.time() - start_time
print("elapsed time: {:.2}".format(elapsed_time))
- Example for KMeans
from sklearn.cluster import KMeans
import numpy as np
import time
from scitime import RuntimeEstimator
# example for kmeans clustering
estimator = RuntimeEstimator(meta_algo='RF', verbose=3)
km = KMeans()
X = np.random.rand(100000,10)
# run the estimation
estimation, lower_bound, upper_bound = estimator.time(km, X)
# compare to the actual training time
start_time = time.time()
km.fit(X)
elapsed_time = time.time() - start_time
print("elapsed time: {:.2}".format(elapsed_time))
The Estimator class arguments:
- meta_algo: The estimator used to predict the time, either RF or NN
- verbose: Controls the amount of log output (either 0, 1, 2 or 3)
- confidence: Confidence for intervals (defaults to 95%)
Parameters of the estimator.time function:
- X: np.array of inputs to be trained
- y: np.array of outputs to be trained (set to None for unsupervised algo)
- algo: algo whose runtime the user wants to predict
--- FOR TESTERS / CONTRIBUTORS ---
Local Testing
Inside virtualenv (with pytest>=3.2.1):
(env)$ python -m pytest
How to use _data.py to generate data / fit models?
$ python _data.py --help
usage: _data.py [-h] [--drop_rate DROP_RATE] [--meta_algo {RF,NN}]
[--verbose VERBOSE]
[--algo {RandomForestRegressor,RandomForestClassifier,SVC,KMeans}]
[--generate_data] [--fit FIT] [--save]
Gather & Persist Data of model training runtimes
optional arguments:
-h, --help show this help message and exit
--drop_rate DROP_RATE
drop rate of number of data generated (from all param
combinations taken from _config.json). Default is
0.999
--meta_algo {RF,NN} meta algo used to fit the meta model (NN or RF) -
default is RF
--verbose VERBOSE verbose mode (0, 1, 2 or 3)
--algo {RandomForestRegressor,RandomForestClassifier,SVC,KMeans}
algo to train data on
--generate_data do you want to generate & write data in a dedicated
csv?
--fit FIT do you want to fit the model? If so indicate the csv
name
--save (only used for model fit) do you want to save /
overwrite the meta model from this fit?
(_data.py uses _model.py behind the scenes)
How to run _model.py?
After pulling the master branch (git pull origin master
) and setting the environment (described above),
run ipython
and:
from scitime._model import RuntimeModelBuilder
# example of data generation for rf regressor
trainer = RuntimeModelBuilder(drop_rate=0.99999, verbose=3, algo='RandomForestRegressor')
inputs, outputs, _ = trainer._generate_data()
# then fitting the meta model
meta_algo = trainer.model_fit(generate_data=False, inputs=inputs, outputs=outputs)
# this should not locally overwrite the pickle file located at scitime/models/{your_model}
# if you want to save the model, set the argument save_model to True
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scitime-0.1.1.tar.gz
(15.4 kB
view hashes)