A comparative benchmarking framework for Python.
Project description
PyBenchTool
A Comparative Benchmarking Framework for Python
PyBenchTool is a benchmarking framework for systematic comparison of multiple function implementations across varying inputs. It records full per-iteration runtime distributions together with disk I/O counters, system metadata, and HPC environment variables, and provides built-in inferential statistics (Welch ANOVA, Games-Howell post-hoc) for rigorous evaluation of performance differences.
The framework targets workloads where run-to-run variance carries information — for instance, I/O-bound compression benchmarks on shared cluster nodes — rather than micro-benchmarks where sub-microsecond precision is the primary concern.
Scope and Positioning
| Capability | timeit / pyperf | perfplot | PyBenchTool |
|---|---|---|---|
| Multiple kernels × multiple inputs | Manual loop | Built-in | Built-in |
| Randomised execution order | No | No | Yes (seed is logged) |
| Full per-iteration distributions | pyperf: yes | No (reports minimum) | Yes |
| Disk I/O tracking per iteration | No | No | Yes |
| SLURM / HPC metadata capture | No | No | Yes (40+ fields) |
| Inferential statistics (ANOVA) | No | No | Welch ANOVA + Games-Howell |
| Cold start (skip warmup) | No | No | Yes |
| Subprocess isolation | pyperf: yes | No | Yes (loky/cloudpickle) |
| Three-clock timing (wall/CPU/thread) | pyperf: wall+CPU | No | Yes + derived I/O metrics |
| Warmup calibration / CPU pinning | pyperf: yes | No | No (single-pass warmup) |
PyBenchTool is appropriate when comparing multiple implementations across varying inputs and the analysis requires full runtime distributions, disk I/O measurements, or hypothesis testing — particularly in HPC environments where SLURM metadata and environment reproducibility are relevant.
pyperf is the better choice for low-noise measurement of a single function, where automatic warmup calibration, outlier detection, CPU pinning, and system tuning are needed to minimise OS-level variance (e.g. micro-benchmarks).
perfplot is sufficient for visual scaling comparisons when per-run distributions and statistical testing are not required.
Features
- Controlled benchmarking — separate setup and cleanup phases, excluded from timing, allow data preparation and post-measurement metadata collection
- Subprocess isolation — each iteration runs in a fresh subprocess by default (via loky/cloudpickle), providing clean memory, disk I/O counters, and GC state per iteration; in-process mode available for low-overhead micro-benchmarks
- Three-clock timing — wall-clock (
perf_counter_ns), CPU (process_time_ns), and thread (thread_time_ns) timers with derived metrics (I/O wait, I/O fraction, thread-pool parallelism) - Metadata capture — CPU architecture, RAM, OS, SLURM environment, library versions, disk I/O counters, and GC state (40+ fields per row)
- Randomised execution — the full kernel × input × n_runs matrix is shuffled to prevent systematic ordering effects; the seed is logged for reproducibility
- Statistical analysis — Welch ANOVA and Games-Howell post-hoc tests account for heterogeneous variances
- Visualisation — bar plots and box plots via matplotlib and seaborn
- Full distributions — every iteration is stored individually for distribution-level analysis
- Cold start mode — skips the warmup run for workloads where the warmup itself is prohibitively expensive
- HPC integration — automatic SLURM variable capture; optional OS page-cache clearing for I/O benchmarks
Known Limitations
- Single-pass warmup — one warmup iteration per kernel (or none with
cold_start=True); no automatic calibration to determine when measurements have stabilised (unlike pyperf) - Subprocess startup overhead — the default
isolation="subprocess"mode spawns a fresh subprocess per iteration via loky (~200–500 ms overhead). For sub-millisecond kernels, useisolation="inprocess" - No outlier detection — outliers from OS scheduling are retained; the statistical tests tolerate moderate outliers but no automatic flagging is performed
- No CPU pinning — on SLURM clusters the scheduler controls affinity; no built-in pinning for bare-metal benchmarks
Installation
pip install pybenchtool
For development:
pip install -e ".[full]"
Quick Start
from pybenchtool import BenchTool
bt = BenchTool(
name="Codec Comparison",
description="blosc2 vs zstd on varying array sizes.",
verbose=True,
version_key_libraries=["numpy", "blosc2"],
)
results = bt.bench(
setup=[prepare_data],
kernel=[compress_blosc2, compress_zstd],
cleanup=[measure_ratio],
input_=[small_array, medium_array, large_array],
n_runs=10,
show_progress=True,
)
bt.results2csv(".")
print(bt.summary())
bt.boxplot()
bt.runtime_htest()
For a complete walkthrough, see the Quick Start guide.
Documentation
Full documentation is available at https://fschwar4.github.io/PyBenchTool/.
To build locally:
pip install -r docs/requirements.txt
cd docs && make html
Building a Distribution
python -m pip install build
python -m build
Artefacts are placed in the dist/ directory.
Repository Structure
PyBenchTool/
├── pybenchtool/
│ ├── __init__.py # Package entry point, version
│ ├── _facade.py # BenchTool facade class (composes the modules below)
│ ├── _metadata.py # System metadata collection, get_conda_version()
│ ├── _runner.py # Benchmark orchestration (bench, _run_iteration)
│ ├── _io.py # CSV I/O (results2csv, load)
│ ├── _analysis.py # Summary statistics, hypothesis testing
│ ├── _plotting.py # Bar plots, box plots, unit conversion
│ └── _utils.py # Shared helpers
├── tests/ # Pytest test suite
├── notebooks/ # Example Jupyter notebooks
├── docs/ # Sphinx documentation source
├── scripts/
│ └── conda_env_export.sh
├── CHANGELOG.md
├── LICENSE
├── README.md
└── pyproject.toml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pybenchtool-0.0.9.tar.gz.
File metadata
- Download URL: pybenchtool-0.0.9.tar.gz
- Upload date:
- Size: 40.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22b027360961a96270dc35706f1ce030946a384833bcd1752ee6f00880de2651
|
|
| MD5 |
2e98a3a93ce919540653c8e78637bbd0
|
|
| BLAKE2b-256 |
11dbd68bf4ccd22cfb8f943e7870288638e043f8f097769d843798bb85527339
|
File details
Details for the file pybenchtool-0.0.9-py3-none-any.whl.
File metadata
- Download URL: pybenchtool-0.0.9-py3-none-any.whl
- Upload date:
- Size: 37.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c997fc5311d43d0ad99c358052862ddde36ca1d9b183b00d62d849e49279ab7
|
|
| MD5 |
d0579e19fc315848305d05b9299c21d4
|
|
| BLAKE2b-256 |
8d189e08ec34a3e995e04b0d85d3a41243ab2f420ff6007fa9e2c681f0d274d8
|