Database drift benchmarking for researchers, DB vendors, and new users: generate, validate, and run data/workload drift with CLI or MCP.
Project description
DriftBench
DriftBench is a toolkit for generating and replaying data drift and workload drift with DriftSpec.
Who uses DriftBench:
- Researcher — design reproducible drift experiments and ablations.
- Database Vendor / Performance Team — run drift regression checks across targets before release.
- New User — start from validated examples and get first outputs quickly.
Version history: CHANGELOG · Production site: driftbench.com
Install
pip install -U driftbench-db
Or from source:
git clone https://github.com/Liuguanli/DriftBench.git
cd DriftBench
pip install -e .
Verify:
driftbench --help
Benchmark Adapters (driftbench.data)
Nine adapters generate real data files and SQL query workloads with no external dependencies
(TPC-H mode="generate" auto-downloads and builds dbgen on first use).
| Adapter | Workload type | Data format | Tables | Queries |
|---|---|---|---|---|
tpch |
OLAP | .tbl (pipe-delimited) |
8 | 22 SQL via qgen |
tpcds |
OLAP / Decision support | .dat (pipe-delimited) |
5 synthetic | 99 query IDs |
tpcc |
OLTP | .csv |
9 | 5 transaction types |
tpcc_skew |
OLTP + hotspot | .csv + weight manifest |
9 | 5 transaction types |
job |
OLAP / join-order | .csv |
11 (IMDB-like) | 20 SQL templates |
ycsb |
Key-value | .csv |
1 | 6 workload mixes (A–F) |
dsb |
Decision support | .csv |
3 star-schema | 3 SQL templates |
pgbench |
TPC-B (OLTP) | .csv |
4 | 3 workloads |
benchbase |
Multi-benchmark | XML + shell script | via live DB | 10 benchmarks |
Generate data and queries
from pathlib import Path
from driftbench.data.tpch import data as tpch_data, queries as tpch_queries
from driftbench.data.tpcds import data as tpcds_data, queries as tpcds_queries
from driftbench.data.tpcc import data as tpcc_data, queries as tpcc_queries
from driftbench.data.tpcc_skew import data as tpcc_skew_data, queries as tpcc_skew_queries
from driftbench.data.job import data as job_data, queries as job_queries
from driftbench.data.ycsb import data as ycsb_data, queries as ycsb_queries
from driftbench.data.dsb import data as dsb_data, queries as dsb_queries
from driftbench.data.pgbench import data as pgbench_data, queries as pgbench_queries
from driftbench.data.benchbase import data as bb_data, queries as bb_queries
out = Path("./artifacts")
# TPC-H — auto-builds dbgen on first use; converts .tbl to .csv with .as_csv()
tpch_data(scale_factor=1, mode="generate").generate(output_dir=out)
tpch_queries(query_ids=[1, 3, 5], queries_per_template=2).generate(output_dir=out)
# TPC-DS — synthetic .dat files; converts to .csv with .as_csv()
tpcds_data(scale_factor=10).generate(output_dir=out)
tpcds_queries().generate(output_dir=out)
# TPC-C — scale_factor = number of warehouses
tpcc_data(scale_factor=4).generate(output_dir=out)
tpcc_queries().generate(output_dir=out)
# TPC-C Skew — Zipf hot-warehouse access distribution
tpcc_skew_data(scale_factor=10, hot_warehouse_fraction=0.2, skew_factor=0.99).generate(output_dir=out)
tpcc_skew_queries(scale_factor=10, hot_warehouse_fraction=0.2).generate(output_dir=out)
# JOB, YCSB, DSB, pgbench
job_data(scale_factor=1).generate(output_dir=out)
ycsb_data(scale_factor=1).generate(output_dir=out)
ycsb_queries(workload="B").generate(output_dir=out)
dsb_data(scale_factor=10).generate(output_dir=out)
pgbench_data(scale_factor=1).generate(output_dir=out)
pgbench_queries(workload="tpcb").generate(output_dir=out)
# BenchBase — generates XML configs + shell scripts for a live database
bb_data(benchmark="tpcc", scale_factor=10).generate(output_dir=out)
bb_queries(benchmark="tpcc", terminals=8, duration=120).generate(output_dir=out)
Output layout
artifacts/
tpch/data/sf_1/tables/ tpch/queries/
tpcds/data/ tpcds/queries/
tpcc/data/ tpcc/queries/
tpcc_skew/data/ tpcc_skew/queries/
job/data/ job/queries/
ycsb/data/ ycsb/queries/
dsb/data/ dsb/queries/
pgbench/data/ pgbench/queries/
benchbase/tpcc/data/ benchbase/tpcc/queries/
Each folder contains a *_manifest.json listing the generated files.
GenerationResult
generate() returns a GenerationResult:
result = tpch_data(scale_factor=1, mode="generate").generate(output_dir=out)
result.files # list of generated file paths
result.metadata # path to the manifest JSON
# Convert pipe-delimited .tbl / .dat to standard CSV (both kept on disk)
csv_result = result.as_csv()
Second call reuses existing files automatically. Pass force=True to regenerate.
CLI Quickstart
# Validate a DriftSpec
python -m driftbench.cli validate-spec driftspec/examples/demo_data_single.yaml --json
# Dry-run (preview execution plan)
python -m driftbench.cli dry-run driftspec/examples/demo_data_single.yaml --json
# Execute
python -m driftbench.cli run-yaml driftspec/examples/demo_data_single.yaml
Python API
from driftbench import run_spec, trace_to_spec
run_spec("driftspec/examples/demo_data_single.yaml")
trace_to_spec("driftspec/trace_inputs/trace_data_mock.csv", "driftspec/generated/from_trace.yaml")
MCP Server
python3 -m driftbench_mcp.server
Core workflow via MCP: trace_to_spec → validate_spec → run_spec → list_outputs
Testing
python -m unittest discover -s test -p 'test_*.py' -v
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file driftbench_db-0.1.0b7.post1.tar.gz.
File metadata
- Download URL: driftbench_db-0.1.0b7.post1.tar.gz
- Upload date:
- Size: 182.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64bfbca57f7c091cf0f9e3f2ef11fc46cf2460e6894d065e5fd71694c4a6d420
|
|
| MD5 |
3ad9e65ac4bd0d5e777aad34722701b0
|
|
| BLAKE2b-256 |
e9399aeac1594f762f37eba338d047504c95d1ffb69a0e21edbbfd720a9b3ebc
|
Provenance
The following attestation bundles were made for driftbench_db-0.1.0b7.post1.tar.gz:
Publisher:
publish.yml on Liuguanli/DriftBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
driftbench_db-0.1.0b7.post1.tar.gz -
Subject digest:
64bfbca57f7c091cf0f9e3f2ef11fc46cf2460e6894d065e5fd71694c4a6d420 - Sigstore transparency entry: 1546494023
- Sigstore integration time:
-
Permalink:
Liuguanli/DriftBench@030ebc970c644c3f1d3d6707cd087984c4a3fecb -
Branch / Tag:
refs/tags/v0.1.0b7.post1 - Owner: https://github.com/Liuguanli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@030ebc970c644c3f1d3d6707cd087984c4a3fecb -
Trigger Event:
push
-
Statement type:
File details
Details for the file driftbench_db-0.1.0b7.post1-py3-none-any.whl.
File metadata
- Download URL: driftbench_db-0.1.0b7.post1-py3-none-any.whl
- Upload date:
- Size: 200.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81daedfbdf1fc6539189770f7b1e609488a8333469ffdcf4e247b37736a8bd18
|
|
| MD5 |
1132d81ecd440efb5e579543c91a6d56
|
|
| BLAKE2b-256 |
f1a829ccb13b6cdbe1e65c8fd68dc2808f002061584609be11f6c9c0df607181
|
Provenance
The following attestation bundles were made for driftbench_db-0.1.0b7.post1-py3-none-any.whl:
Publisher:
publish.yml on Liuguanli/DriftBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
driftbench_db-0.1.0b7.post1-py3-none-any.whl -
Subject digest:
81daedfbdf1fc6539189770f7b1e609488a8333469ffdcf4e247b37736a8bd18 - Sigstore transparency entry: 1546494074
- Sigstore integration time:
-
Permalink:
Liuguanli/DriftBench@030ebc970c644c3f1d3d6707cd087984c4a3fecb -
Branch / Tag:
refs/tags/v0.1.0b7.post1 - Owner: https://github.com/Liuguanli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@030ebc970c644c3f1d3d6707cd087984c4a3fecb -
Trigger Event:
push
-
Statement type: