Skip to main content

Scenario-based polyglot database benchmark platform

Project description

BenchForge

CI codecov Release Security Scans Python 3.10+ License: MIT PyPI Docs

Research-grade, scenario-based database benchmark platform for DB researchers and engineers.

Compare different DB access stacks — driver, ORM, language — under identical workloads with statistical rigor suitable for academic publication (VLDB, SIGMOD, OSDI) and professional engineering evaluation.

Why BenchForge?

Most database benchmark scripts are one-off, ad-hoc, and produce results that cannot be reproduced or trusted. BenchForge addresses this by providing:

  • Statistical rigor — Multi-iteration experiments with bootstrap confidence intervals, not single-run "eyeball" comparisons
  • Reproducibility — Seed control, full environment capture, setup/teardown isolation, and versioned result schemas
  • Publication quality — Reports designed for academic papers: ECDF plots, CI error bars, booktabs tables, colorblind-safe palette
  • Apples-to-apples comparison — Run the exact same workload across different drivers, ORMs, or languages with bench compare

BenchForge is not a distributed load generator, a database provisioning tool, or a replacement for TPC benchmarks. It is a focused tool for comparing database access stacks under controlled conditions.

Key Features

  • Multi-iteration experiments with seed control for reproducibility
  • HDR histogram (in-house, zero-dep) for O(1) latency recording with configurable precision
  • Cross-iteration statistics: mean, stdev, CV, 95% CI (bootstrap)
  • Time-series collection in 1-second windows: throughput, errors, latency quantiles
  • Publication-quality HTML reports: paper theme (Crimson Pro + Source Sans 3, booktabs tables), ECDF plots, CI error bars, time-series charts, Okabe-Ito colorblind-safe palette
  • Environment capture: CPU, memory, OS, Python version, DB server config
  • Setup/teardown queries per iteration for run isolation
  • Warmup phase excluded from measurement

Installation

From PyPI (recommended)

pip install benchforge

From source (development)

git clone https://github.com/yeongseon/benchforge.git
cd benchflow
pip install -e ".[dev]"

With pipx (isolated install)

pipx install benchforge

Dependencies

BenchForge requires Python 3.10+ and includes the following dependencies:

Package Purpose
psycopg[binary] PostgreSQL driver (psycopg3)
sqlalchemy SQLAlchemy Core/ORM driver
pydantic Scenario schema validation
pyyaml YAML scenario loading
typer + rich CLI interface
jinja2 + plotly HTML report generation
numpy Bootstrap CI computation

Quick Start

# 1. Start PostgreSQL
docker compose up -d

# 2. Install BenchForge
pip install -e ".[dev]"

# 3. Run a benchmark (5 iterations, seed=42)
bench run scenarios/basic.yaml -v

# 4. Override iterations/seed from CLI
bench run scenarios/basic.yaml -n 10 --seed 123

# 5. Compare two runs
bench compare reports/run1.json reports/run2.json

# 6. Generate HTML report
bench report reports/run1.json

For a detailed walkthrough, see docs/quickstart.md.

Example Scenarios

BenchForge ships with ready-to-use example scenarios in examples/:

Scenario File Description
OLTP Point Lookups oltp_point_lookups.yaml Single-row SELECT by PK — measures point-query latency and driver overhead
Analytical Aggregation analytical_aggregation.yaml GROUP BY over 500K rows — full-table scans, aggregation, OLAP-style queries
Connection Pool Stress connection_pool_stress.yaml 32-worker concurrency stress — connection overhead and latency degradation
Mixed Read/Write mixed_read_write.yaml Banking-style OLTP — interleaved SELECTs, UPDATEs, and INSERTs
Index vs Seq Scan index_scan_vs_seq_scan.yaml Selectivity impact on query planner — index scan vs sequential scan paths

Run any example:

bench run examples/oltp_point_lookups.yaml -v
bench run examples/mixed_read_write.yaml -n 3 --seed 7

Scenario Format

name: basic-select
description: "Basic point SELECT benchmark: psycopg vs SQLAlchemy"

setup:
  queries:
    - "CREATE TABLE IF NOT EXISTS users (id SERIAL PRIMARY KEY, name VARCHAR(100))"
    - "INSERT INTO users (name) SELECT 'user_' || i FROM generate_series(1, 1000) AS i ON CONFLICT DO NOTHING"

teardown:
  queries:
    - "TRUNCATE TABLE users"

steps:
  - name: point-select
    query: "SELECT * FROM users WHERE id = %(id)s"
    params:
      id: "random_int(1, 1000)"

load:
  concurrency: 4
  duration: 10
  warmup:
    duration: 3

experiment:
  iterations: 5
  seed: 42
  pause_between: 2.0

targets:
  - name: psycopg-raw
    stack_id: python+psycopg
    driver: psycopg
    dsn: "postgresql://postgres:postgres@localhost:5432/benchflow"
  - name: sqlalchemy-core
    stack_id: python+sqlalchemy
    driver: sqlalchemy
    dsn: "postgresql+psycopg://postgres:postgres@localhost:5432/benchflow"

For the complete DSL specification, see docs/scenario-reference.md.

Architecture

Controller (Python Core)
  +-- Scenario Engine       YAML DSL -> Pydantic models + ExperimentConfig
  +-- Threaded Runner       barrier-sync, perf_counter_ns, GC control, multi-iteration
  +-- HDR Histogram         O(1) record, log-bucket, mergeable across threads
  +-- Metrics Aggregator    histogram percentiles, bootstrap CI, cross-iteration stats
  +-- Report Generator      publication-quality HTML (paper + dark themes)

Workers (per-thread lifecycle)
  +-- PsycopgWorker         raw psycopg3, one connection per thread
  +-- SQLAlchemyWorker      SQLAlchemy Core, shared engine, param translation

For a detailed architecture walkthrough, see docs/architecture.md.

Project Structure

benchflow/
  benchflow/
    core/
      runner/runner.py          # Multi-iteration threaded benchmark execution
      scenario/schema.py        # Pydantic scenario models + ExperimentConfig
      scenario/loader.py        # YAML loading
      metrics/aggregator.py     # Latency stats, bootstrap CI, cross-iteration aggregation
      metrics/histogram.py      # HDR-style log-bucket histogram
      report/html.py            # Publication-quality HTML report generator
      result.py                 # Versioned result JSON schema (v2)
    cli/main.py                 # Typer CLI (run/compare/report)
    workers/
      protocol.py               # Worker ABC + registry
      python/
        psycopg_worker.py
        sqlalchemy_worker.py
  scenarios/basic.yaml
  examples/                     # Ready-to-use benchmark scenarios
  docs/                         # Comprehensive documentation
  tests/

CLI Reference

bench run <scenario.yaml> [OPTIONS]
  -o, --output          Output JSON path
  -n, --iterations      Override iteration count
  --seed                Override random seed
  --capture-db-info     Capture DB server config via introspect()
  -v, --verbose         Enable verbose logging

bench compare <baseline.json> <contender.json> [OPTIONS]
  -o, --output          Output comparison JSON

bench report <result.json> [OPTIONS]
  -o, --output          Output HTML path

CLI Stability Note: The bench run, bench compare, and bench report commands are considered stable as of v0.1.0. Subcommand names and core flags (-o, -n, --seed, -v) will follow semantic versioning — breaking changes only in major versions.

Documentation

Document Description
Quick Start Install, run, report, compare — step by step
Concepts Scenarios, steps, targets, workers, iterations, result schema
Methodology Clock sources, HDR histograms, bootstrap CI, time-series
Reproducibility Pre/during/post benchmark checklists, pitfalls
Scenario Reference Complete DSL specification with every field documented
Architecture System overview, components, execution flow, extension points

Contributing

See CONTRIBUTING.md for development setup, testing, code style, and guidelines for adding scenarios and workers.

Citing BenchForge

If you use BenchForge in your research, please cite it:

@software{choe2026benchflow,
  title  = {BenchForge: Research-Grade Database Benchmark Platform},
  author = {Choe, Yeongseon},
  year   = {2026},
  url    = {https://github.com/yeongseon/benchforge},
}

See CITATION.cff for machine-readable citation metadata.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchforge-0.1.0.tar.gz (48.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

benchforge-0.1.0-py3-none-any.whl (44.1 kB view details)

Uploaded Python 3

File details

Details for the file benchforge-0.1.0.tar.gz.

File metadata

  • Download URL: benchforge-0.1.0.tar.gz
  • Upload date:
  • Size: 48.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for benchforge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 015f168961dc4df9e192dd92f81cb283fdd81e6a3165d8e2c16d407283d18e10
MD5 575441afc938000a95aacd2b268fe9b2
BLAKE2b-256 507caa5b50846c57856ba42971ffbb11684dc72ad4cd082e378b67764fd19cf4

See more details on using hashes here.

Provenance

The following attestation bundles were made for benchforge-0.1.0.tar.gz:

Publisher: publish-pypi.yml on yeongseon/benchforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file benchforge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: benchforge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for benchforge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b00fce3b8f27fcdd0f75dc4ba3101ce59d6c8a40b372e8f0e321706efffea6a
MD5 a270799be75cc8e83013e155ad81c659
BLAKE2b-256 f8bcb28640440b8f340f6c49f67b2b24d7b0ba42a15850e94bf2a53b45f8c5de

See more details on using hashes here.

Provenance

The following attestation bundles were made for benchforge-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on yeongseon/benchforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page