Skip to main content

Benchmarking tool

Project description

Bmark

Python benchmarking tool.

It's mostly used for benchmarking FeatherStore, but can be used to benchmark any Python code.

Installation

Bmark is available on PyPI:

python -m pip install bmark-py

Basic usage

First let's setup a class to be benchmarked, all methods except run() are optional.

import os
import bmark
import pandas as pd

read_bench = bmark.Benchmark()

@read_bench()  # Remember the parantheses
class read_csv(bmark.Benched):

    def __init__(self, shape, engine):
        self.name = f'pd.read_csv(engine={engine})'
        self.rows, self.cols = shape
        self._path = '_benchmarks'
        self.file_path = os.path.join(self._path, 'table.csv')
        self.engine = engine
        super().__init__()

    def run(self):
        """Code to be benchmarked"""
        pd.read_csv(self.file_path, engine=self.engine)

    def setup(self):
        """Setup for the entire duration of the timer"""
        data = {f'c{i}': range(self.rows) for i in range(self.cols)}
        self.df = pd.DataFrame(data)
        if not os.path.exists(self._path):
            os.makedirs(self._path)

    def teardown(self):
        """Final teardown after all the timings are done"""
        os.rmdir(self._path)

    def __enter__(self):
        """Called before each loop in the timer"""
        self.df.to_csv(self.file_path)
        return self  # Important

    def __exit__(self, *args):
        """Called after each loop in the timer"""
        os.remove(self.file_path)

We initialize a benchmark with bmark.Benchmark(). We can the register classes to be benchmarked by using the Benchmark object as a decorator (as show above).

Each time we initialize a registered class it'll get added as an item to be benchmarked:

shape = (100_000, 10)
read_csv(shape, engine='c')
read_csv(shape, engine='python')
read_csv(shape, engine='pyarrow')

header = f'Read CSV benchmark {shape}'
read_bench.run(header, r=5, n=5, sort=True)

>>                 Read CSV benchmark (100000, 10)
 Name                         Hits     Best    Worst  Comparison
─────────────────────────────┼──────┼─────────┼─────────┼────────────
 pd.read_csv(engine=pyarrow)    25  16.2 ms  20.5 ms       1.00x
 pd.read_csv(engine=c)          25  74.2 ms   106 ms       4.58x
 pd.read_csv(engine=python)     25   803 ms   862 ms      49.54x

Runtimes: total 46.5 s, benchmark 23.4 s, other 23.2 s

We can also populate benchmarks by passing all the objects we want benchmarked in a list directly into the bmark.Benchmark constructor:

shape = (500_000, 20)
items = (
    read_csv(shape, engine='c'),
    read_csv(shape, engine='python'),
    read_csv(shape, engine='pyarrow')
)
header = f'Read CSV benchmark {shape}'
bmark.Benchmark(items).run(header, r=1, n=5, sort=True)

>>           Read CSV benchmark (500000, 20)
 Name                         Hits    Time  Comparison
─────────────────────────────┼──────┼────────┼────────────
 pd.read_csv(engine=pyarrow)     5  112 ms       1.00x
 pd.read_csv(engine=c)           5  655 ms       5.85x
 pd.read_csv(engine=python)      5  7.49 s      66.98x

Runtimes: total 1min 21s, benchmark 41.3 s, other 40.2 s

For a more in-depth example, see the FeatherStore benchmarking suite.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

bmark_py-0.0.4-py3-none-any.whl (9.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page