Benchmark and analyze functions' time execution and results over the course of development
Project description
Benchmarkit
Benchmark and analyze functions' time execution and results over the course of development.
Features
- No boilerplate code
- Saves history and additional info
- Saves function output and parameters to benchmark data science tasks
- Easy to analyze results
Motivation
- I need to benchmark execution time of my function
- I don't want to memorize and write boilerplate code
- I want to compare results with previous runs before some changes were introduced
- I don't want to manually write down results somewhere
- I want to know exact commits of my previous runs months ago
- I want to benchmark accuracy, precision, recall of my models and keep track of hyperparameters
Usage
Benchmark execution times
Put @benchmark
decorator over function with piece of code that should be timed
from benchmarkit import benchmark, benchmark_run
N = 10000
seq_list = list(range(N))
seq_set = set(range(N))
SAVE_PATH = '/tmp/benchmark_time.jsonl'
@benchmark(num_iters=100, save_params=True, save_output=False)
def search_in_list(num_items=N):
return num_items - 1 in seq_list
@benchmark(num_iters=100, save_params=True, save_output=False)
def search_in_set(num_items=N):
return num_items - 1 in seq_set
- num_iters - how many times to repeat benchmarked function. Default 1
- save_params - save parameters passed to the benchmarked function in the file with benchmark results. In the example above
num_items
will be saved. Default False - save_output - save benchmarked function output. Should return dict
{'name': value}
. Default False. See example how to benchmark model results.
Run benchmark:
benchmark_results = benchmark_run(
[search_in_list, search_in_set],
SAVE_PATH,
comment='initial benchmark search',
rows_limit=10,
extra_fields=['num_items'],
metric='mean_time',
bigger_is_better=False,
)
- functions - function or list of functions with
benchmark
decorator - save_file - path to file where to save results
- comment - comment to save alongside the results
- rows_limit - limit table rows in console output. Default 10
- extra_fields - extra fields to include in console output
- metric - metric which is used for comparison. Default
mean_time
- bigger_is_better - whether bigger value of metric indicates that result is better. For time benchmarks should be
False
, for model accuracy should beTrue
. Default False
Prints to terminal and returns list of dictionaries with data for the last run.
Change N=1000000
and rerun
The same can be run from command line:
benchmark_run test_data/time/benchmark_functions.py --save_dir /tmp/ --comment "million items" --extra_fields num_items
Benchmark model results
from benchmarkit import benchmark, benchmark_run
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
MODEL_BENCHMARK_SAVE_FILE = '/tmp/benchmark_model.jsonl'
x, y = load_iris(return_X_y=True)
@benchmark(save_params=True, save_output=True)
def log_regression(C=1.0, fit_intercept=True):
clf = LogisticRegression(
random_state=0,
solver='lbfgs',
multi_class='multinomial',
C=C,
fit_intercept=fit_intercept,
)
clf.fit(x, y)
score = clf.score(x, y)
return {'score': score}
model_benchmark_results = benchmark_run(
log_regression,
MODEL_BENCHMARK_SAVE_FILE,
comment='baseline model',
extra_fields=['C', 'fit_intercept'],
metric='score',
bigger_is_better=True,
)
Change hyperparameter C=0.5
and rerun. Output:
The same can be run from command line:
benchmark_run file_with_benchmark.py --save_dir /tmp/ --comment "stronger regularization" --extra_fields C fit_intercept --metric score --bigger_is_better
Analyze results from the file
from benchmarkit import benchmark_analyze
SAVE_PATH = '/tmp/benchmark_time.jsonl'
benchmark_df = benchmark_analyze(
SAVE_PATH,
func_name=None,
rows_limit=10,
metric='mean_time',
bigger_is_better=False,
extra_fields=['num_items'],
)
- input_path - path to
.jsonl
file or directory with.jsonl
files with benchmark results - func_name - display statistics for particular function. If
None
then all functions, stored in file, are displayed. Default None - rows_limit - limit table rows in console output. Default 10
- metric - metric which is used for comparison. Default
mean_time
- bigger_is_better - whether bigger value of metric indicates that result is better. For time benchmarks should be
False
, for model accuracy should beTrue
. Default False - extra_fields - extra fields to include in console output
Prints to terminal and returns pandas DataFrame
.
The same can be run from command line:
benchmark_analyze /tmp/benchmark_time.jsonl --extra_fields num_items
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file benchmarkit-0.0.2.tar.gz
.
File metadata
- Download URL: benchmarkit-0.0.2.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc588cb7582babcbec4d670c35c475dae844696f9f0231bad740b5561cc4e7b9 |
|
MD5 | 456a8ef5ef8a5dd510d824c5972f9910 |
|
BLAKE2b-256 | 6a1c11402b289ef81d54f8824b8bd6e503d4eb8dca746c0f34f05ee3597a14fb |
File details
Details for the file benchmarkit-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: benchmarkit-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33f27ec2b926d788b3fd7fb8dab35e8d687b14251b3cefc195680fbec94597d7 |
|
MD5 | 84dd4a97db59c14c06df31d02ab7f2e5 |
|
BLAKE2b-256 | 5dfe0bbde1c725e8cc822d650d265aee2400f08b959dc490e43e68657ef8ea53 |