Skip to main content

This python tools helps managing DBMS benchmarking experiments in a Kubernetes-based HPC cluster environment. It enables users to configure hardware / software setups for easily repeating tests over varying configurations.

Project description

Maintenance GitHub release PyPI version .github/workflows/draft-pdf.yml bexhoma Documentation Status

Benchmark Experiment Host Manager (Bexhoma)

Orchestrating Cloud-Native Benchmarking Experiments with Kubernetes

This Python tools helps managing benchmark experiments of Database Management Systems (DBMS) in a Kubernetes-based High-Performance-Computing (HPC) cluster environment. It enables users to configure hardware / software setups for easily repeating tests over varying configurations.

It serves as the orchestrator [2] for distributed parallel benchmarking experiments in a Kubernetes Cloud. This has been tested at Amazon Web Services, Google Cloud, Microsoft Azure, IBM Cloud, Oracle Cloud, and at Minikube installations, running with Citus Data (Hyperscale), Clickhouse, CockroachDB, Exasol, IBM DB2, MariaDB, MariaDB Columnstore, MemSQL (SingleStore), MonetDB, MySQL, OmniSci (HEAVY.AI), Oracle DB, PostgreSQL, SQL Server, SAP HANA, TimescaleDB, Vertica and YugabyteDB.

Benchmarks included are YCSB, TPC-H and TPC-C (HammerDB and Benchbase version).

The basic workflow is [1,2]: start a containerized version of the DBMS, install monitoring software, import data, run benchmarks and shut down everything with a single command. A more advanced workflow is: Plan a sequence of such experiments, run plan as a batch and join results for comparison.

It is also possible to scale-out drivers for generating and loading data and for benchmarking to simulate cloud-native environments as in [4]. See example results as presented in A Cloud-Native Adoption of Classical DBMS Performance Benchmarks and Tools and how they are generated.

See the homepage and the documentation.

If you encounter any issues, please report them to our Github issue tracker.

Installation

  1. Download the repository: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager
  2. Install the package pip install bexhoma
  3. Make sure you have a working kubectl installed.
    • (Also make sure to have access to a running Kubernetes cluster - for example Minikube)
    • (Also make sure, you can create PV via PVC and dynamic provisioning)
  4. Adjust configuration
    1. Copy k8s-cluster.config to cluster.config
    2. Set name of context, namespace and name of cluster in that file
    3. Make sure the resultfolder is set to a folder that exists on your local filesystem
  5. Other components like the shared data and result directories, the message queue and the evaluator are installed automatically when you start an experiment. Before that, you might want to adjust

Quickstart

YCSB

  1. Run python ycsb.py -ms 1 -tr -sf 1 --workload a -dbms PostgreSQL -tb 16384 -nlp 1 -nlt 64 -nlf 4 -nbp 1,8 -nbt 64 -nbf 2,3 run.
    This installs PostgreSQL and runs YCSB workload A with varying target. The driver is monolithic with 64 threads. The experiments runs a second time with the driver scaled out to 8 instances each having 8 threads.
  2. You can watch status using bexperiments status while running.
  3. After benchmarking has finished, you will see a summary.
    For further inspections, run bexperiments dashboard to connect to a dashboard. You can open dashboard in browser at http://localhost:8050. Alternatively you can open a Jupyter notebook at http://localhost:8888.

See more details at https://bexhoma.readthedocs.io/en/latest/Example-YCSB.html

HammerDB's TPC-C

  1. Run python hammerdb.py -ms 1 -tr -sf 16 -sd 5 -dbms PostgreSQL -nlt 16 -nbp 1,2 -nbt 16 run.
    This installs PostgreSQL and runs HammerDB's TPC-C with 16 warehouses. The driver is monolithic with 16 threads. The experiments runs a second time with the driver scaled out to 2 instances each having 8 threads.
  2. You can watch status using bexperiments status while running.
  3. After benchmarking has finished, you will see a summary.
    For further inspections, run bexperiments dashboard to connect to a dashboard. You can open dashboard in browser at http://localhost:8050. Alternatively you can open a Jupyter notebook at http://localhost:8888.

See more details at https://bexhoma.readthedocs.io/en/latest/Example-HammerDB.html

Benchbase's TPC-C

  1. Run python benchbase.py -ms 1 -tr -sf 16 -sd 5 -dbms PostgreSQL -nbp 1,2 -nbt 16 -nbf 16 -tb 1024 run.
    This installs PostgreSQL and runs Benchbase's TPC-C with 16 warehouses. The driver is monolithic with 16 threads. The experiments runs a second time with the driver scaled out to 2 instances each having 8 threads.
  2. You can watch status using bexperiments status while running.
  3. After benchmarking has finished, you will see a summary.
    For further inspections, run bexperiments dashboard to connect to a dashboard. You can open dashboard in browser at http://localhost:8050. Alternatively you can open a Jupyter notebook at http://localhost:8888.

See more details at https://bexhoma.readthedocs.io/en/latest/Example-HammerDB.html

TPC-H

  1. Run python tpch.py -ms 1 -dbms PostgreSQL run.
    This installs PostgreSQL and runs TPC-H at scale factor 1. The driver is monolithic.
  2. You can watch status using bexperiments status while running.
  3. After benchmarking has finished, you will see a summary.
    For further inspections, run bexperiments dashboard to connect to a dashboard. You can open a Jupyter notebook at http://localhost:8888.

See more details at https://bexhoma.readthedocs.io/en/latest/Example-TPC-H.html

More Informations

For full power, use this tool as an orchestrator as in [2]. It also starts a monitoring container using Prometheus and a metrics collector container using cAdvisor. For analytical use cases, the Python package dbmsbenchmarker, [3], is used as query executor and evaluator as in [1,2]. For transactional use cases, HammerDB's TPC-C, Benchbase's TPC-C and YCSB are used as drivers for generating and loading data and for running the workload as in [4].

See the images folder for more details.

Contributing, Bug Reports

If you have any question or found a bug, please report them to our Github issue tracker. In any bug report, please let us know:

  • Which operating system and hardware (32 bit or 64 bit) you are using
  • Python version
  • Bexhoma version (or git commit/date)
  • Traceback that occurs (the full error message)

We are always looking for people interested in helping with code development, documentation writing, technical administration, and whatever else comes up. If you wish to contribute, please first read the contribution section or visit the documentation.

References

If you use Bexhoma in work contributing to a scientific publication, we kindly ask that you cite our application note [2] or [1]:

[1] A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking

Erdelt P.K. (2021) A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking. In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2020. Lecture Notes in Computer Science, vol 12752. Springer, Cham. https://doi.org/10.1007/978-3-030-84924-5_6

[2] Orchestrating DBMS Benchmarking in the Cloud with Kubernetes

Erdelt P.K. (2022) Orchestrating DBMS Benchmarking in the Cloud with Kubernetes. In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2021. Lecture Notes in Computer Science, vol 13169. Springer, Cham. https://doi.org/10.1007/978-3-030-94437-7_6

[3] DBMS-Benchmarker: Benchmark and Evaluate DBMS in Python

Erdelt P.K., Jestel J. (2022). DBMS-Benchmarker: Benchmark and Evaluate DBMS in Python. Journal of Open Source Software, 7(79), 4628 https://doi.org/10.21105/joss.04628

[4] A Cloud-Native Adoption of Classical DBMS Performance Benchmarks and Tools

Erdelt P.K. (2024) https://doi.org/10.1007/978-3-031-68031-1_9

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bexhoma-0.7.2.tar.gz (99.0 kB view details)

Uploaded Source

Built Distribution

bexhoma-0.7.2-py3-none-any.whl (103.5 kB view details)

Uploaded Python 3

File details

Details for the file bexhoma-0.7.2.tar.gz.

File metadata

  • Download URL: bexhoma-0.7.2.tar.gz
  • Upload date:
  • Size: 99.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for bexhoma-0.7.2.tar.gz
Algorithm Hash digest
SHA256 8f9b02febe112a0edee1f979847468e47add720d5ad9abaf40a9d890441b4156
MD5 f8290f34bb7dc9771470d17f35f47212
BLAKE2b-256 a749fb8a216530c195641f17e8160e789d4c7966a228d0d1f6e103e84f8be913

See more details on using hashes here.

File details

Details for the file bexhoma-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: bexhoma-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 103.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for bexhoma-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c1f12909988071577173cfc409841d9257a200e2a33568bfab71af5227c72d87
MD5 d35ab076e82247e9670eec7d993c2070
BLAKE2b-256 8e650bce8e0a59b8d8eef01847b915d9f2a264df4491a793ac2bdefd7f1e66d2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page