Skip to main content

A comparative benchmarking framework for Python.

Project description

PyBenchTool

A Comparative Benchmarking Framework for Python

PyBenchTool is a benchmarking framework for systematic comparison of multiple function implementations across varying inputs. It records full per-iteration runtime distributions together with disk I/O counters, system metadata, and HPC environment variables, and provides built-in inferential statistics (Welch ANOVA, Games-Howell post-hoc) for rigorous evaluation of performance differences.

The framework targets workloads where run-to-run variance carries information — for instance, I/O-bound compression benchmarks on shared cluster nodes — rather than micro-benchmarks where sub-microsecond precision is the primary concern.

Scope and Positioning

Capability timeit / pyperf perfplot PyBenchTool
Multiple kernels × multiple inputs Manual loop Built-in Built-in
Randomised execution order No No Yes (seed is logged)
Full per-iteration distributions pyperf: yes No (reports minimum) Yes
Disk I/O tracking per iteration No No Yes
SLURM / HPC metadata capture No No Yes (40+ fields)
Inferential statistics (ANOVA) No No Welch ANOVA + Games-Howell
Cold start (skip warmup) No No Yes
Subprocess isolation pyperf: yes No Yes (loky/cloudpickle)
Three-clock timing (wall/CPU/thread) pyperf: wall+CPU No Yes + derived I/O metrics
Warmup calibration / CPU pinning pyperf: yes No No (single-pass warmup)

PyBenchTool is appropriate when comparing multiple implementations across varying inputs and the analysis requires full runtime distributions, disk I/O measurements, or hypothesis testing — particularly in HPC environments where SLURM metadata and environment reproducibility are relevant.

pyperf is the better choice for low-noise measurement of a single function, where automatic warmup calibration, outlier detection, CPU pinning, and system tuning are needed to minimise OS-level variance (e.g. micro-benchmarks).

perfplot is sufficient for visual scaling comparisons when per-run distributions and statistical testing are not required.

Features

  • Controlled benchmarking — separate setup and cleanup phases, excluded from timing, allow data preparation and post-measurement metadata collection
  • Subprocess isolation — each iteration runs in a fresh subprocess by default (via loky/cloudpickle), providing clean memory, disk I/O counters, and GC state per iteration; in-process mode available for low-overhead micro-benchmarks
  • Three-clock timing — wall-clock (perf_counter_ns), CPU (process_time_ns), and thread (thread_time_ns) timers with derived metrics (I/O wait, I/O fraction, thread-pool parallelism)
  • Metadata capture — CPU architecture, RAM, OS, SLURM environment, library versions, disk I/O counters, and GC state (40+ fields per row)
  • Randomised execution — the full kernel × input × n_runs matrix is shuffled to prevent systematic ordering effects; the seed is logged for reproducibility
  • Statistical analysis — Welch ANOVA and Games-Howell post-hoc tests account for heterogeneous variances
  • Visualisation — bar plots and box plots via matplotlib and seaborn
  • Full distributions — every iteration is stored individually for distribution-level analysis
  • Cold start mode — skips the warmup run for workloads where the warmup itself is prohibitively expensive
  • HPC integration — automatic SLURM variable capture; optional OS page-cache clearing for I/O benchmarks

Known Limitations

  • Single-pass warmup — one warmup iteration per kernel (or none with cold_start=True); no automatic calibration to determine when measurements have stabilised (unlike pyperf)
  • Subprocess startup overhead — the default isolation="subprocess" mode spawns a fresh subprocess per iteration via loky (~200–500 ms overhead). For sub-millisecond kernels, use isolation="inprocess"
  • No outlier detection — outliers from OS scheduling are retained; the statistical tests tolerate moderate outliers but no automatic flagging is performed
  • No CPU pinning — on SLURM clusters the scheduler controls affinity; no built-in pinning for bare-metal benchmarks

Installation

pip install pybenchtool

For development:

pip install -e ".[full]"

Quick Start

from pybenchtool import BenchTool

bt = BenchTool(
    name="Codec Comparison",
    description="blosc2 vs zstd on varying array sizes.",
    verbose=True,
    version_key_libraries=["numpy", "blosc2"],
)

results = bt.bench(
    setup=[prepare_data],
    kernel=[compress_blosc2, compress_zstd],
    cleanup=[measure_ratio],
    input_=[small_array, medium_array, large_array],
    n_runs=10,
    show_progress=True,
)

bt.results2csv(".")
print(bt.summary())
bt.boxplot()
bt.runtime_htest()

For a complete walkthrough, see the Quick Start guide.

Documentation

Full documentation is available at https://fschwar4.github.io/PyBenchTool/.

To build locally:

pip install -r docs/requirements.txt
cd docs && make html

Building a Distribution

python -m pip install build
python -m build

Artefacts are placed in the dist/ directory.

Repository Structure

PyBenchTool/
├── pybenchtool/
│   ├── __init__.py      # Package entry point, version
│   ├── _facade.py       # BenchTool facade class (composes the modules below)
│   ├── _metadata.py     # System metadata collection, get_conda_version()
│   ├── _runner.py       # Benchmark orchestration (bench, _run_iteration)
│   ├── _io.py           # CSV I/O (results2csv, load)
│   ├── _analysis.py     # Summary statistics, hypothesis testing
│   ├── _plotting.py     # Bar plots, box plots, unit conversion
│   └── _utils.py        # Shared helpers
├── tests/               # Pytest test suite
├── notebooks/           # Example Jupyter notebooks
├── docs/                # Sphinx documentation source
├── scripts/
│   └── conda_env_export.sh
├── CHANGELOG.md
├── LICENSE
├── README.md
└── pyproject.toml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybenchtool-0.0.9.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pybenchtool-0.0.9-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file pybenchtool-0.0.9.tar.gz.

File metadata

  • Download URL: pybenchtool-0.0.9.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pybenchtool-0.0.9.tar.gz
Algorithm Hash digest
SHA256 22b027360961a96270dc35706f1ce030946a384833bcd1752ee6f00880de2651
MD5 2e98a3a93ce919540653c8e78637bbd0
BLAKE2b-256 11dbd68bf4ccd22cfb8f943e7870288638e043f8f097769d843798bb85527339

See more details on using hashes here.

File details

Details for the file pybenchtool-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: pybenchtool-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 37.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pybenchtool-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 7c997fc5311d43d0ad99c358052862ddde36ca1d9b183b00d62d849e49279ab7
MD5 d0579e19fc315848305d05b9299c21d4
BLAKE2b-256 8d189e08ec34a3e995e04b0d85d3a41243ab2f420ff6007fa9e2c681f0d274d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page