Database drift benchmarking for researchers, DB vendors, and new users: generate, validate, and run data/workload drift with CLI or MCP.

These details have not been verified by PyPI

Project links

Project description

DriftBench logo

DriftBench

DriftBench is a toolkit for generating and replaying data drift and workload drift with DriftSpec.

Who uses DriftBench:

Researcher — design reproducible drift experiments and ablations.
Database Vendor / Performance Team — run drift regression checks across targets before release.
New User — start from validated examples and get first outputs quickly.

Version history: CHANGELOG · Production site: driftbench.com

Install

pip install -U driftbench-db

Or from source:

git clone https://github.com/Liuguanli/DriftBench.git
cd DriftBench
pip install -e .

Verify:

driftbench --help

Benchmark Adapters (`driftbench.data`)

Nine adapters generate real data files and SQL query workloads with no external dependencies (TPC-H mode="generate" auto-downloads and builds dbgen on first use).

Adapter	Workload type	Data format	Tables	Queries
`tpch`	OLAP	`.tbl` (pipe-delimited)	8	22 SQL via qgen
`tpcds`	OLAP / Decision support	`.dat` (pipe-delimited)	5 synthetic	99 query IDs
`tpcc`	OLTP	`.csv`	9	5 transaction types
`tpcc_skew`	OLTP + hotspot	`.csv` + weight manifest	9	5 transaction types
`job`	OLAP / join-order	`.csv`	11 (IMDB-like)	20 SQL templates
`ycsb`	Key-value	`.csv`	1	6 workload mixes (A–F)
`dsb`	Decision support	`.csv`	3 star-schema	3 SQL templates
`pgbench`	TPC-B (OLTP)	`.csv`	4	3 workloads
`benchbase`	Multi-benchmark	XML + shell script	via live DB	10 benchmarks

Generate data and queries

from pathlib import Path
from driftbench.data.tpch import data as tpch_data, queries as tpch_queries
from driftbench.data.tpcds import data as tpcds_data, queries as tpcds_queries
from driftbench.data.tpcc import data as tpcc_data, queries as tpcc_queries
from driftbench.data.tpcc_skew import data as tpcc_skew_data, queries as tpcc_skew_queries
from driftbench.data.job import data as job_data, queries as job_queries
from driftbench.data.ycsb import data as ycsb_data, queries as ycsb_queries
from driftbench.data.dsb import data as dsb_data, queries as dsb_queries
from driftbench.data.pgbench import data as pgbench_data, queries as pgbench_queries
from driftbench.data.benchbase import data as bb_data, queries as bb_queries

out = Path("./artifacts")

# TPC-H — auto-builds dbgen on first use; converts .tbl to .csv with .as_csv()
tpch_data(scale_factor=1, mode="generate").generate(output_dir=out)
tpch_queries(query_ids=[1, 3, 5], queries_per_template=2).generate(output_dir=out)

# TPC-DS — synthetic .dat files; converts to .csv with .as_csv()
tpcds_data(scale_factor=10).generate(output_dir=out)
tpcds_queries().generate(output_dir=out)

# TPC-C — scale_factor = number of warehouses
tpcc_data(scale_factor=4).generate(output_dir=out)
tpcc_queries().generate(output_dir=out)

# TPC-C Skew — Zipf hot-warehouse access distribution
tpcc_skew_data(scale_factor=10, hot_warehouse_fraction=0.2, skew_factor=0.99).generate(output_dir=out)
tpcc_skew_queries(scale_factor=10, hot_warehouse_fraction=0.2).generate(output_dir=out)

# JOB, YCSB, DSB, pgbench
job_data(scale_factor=1).generate(output_dir=out)
ycsb_data(scale_factor=1).generate(output_dir=out)
ycsb_queries(workload="B").generate(output_dir=out)
dsb_data(scale_factor=10).generate(output_dir=out)
pgbench_data(scale_factor=1).generate(output_dir=out)
pgbench_queries(workload="tpcb").generate(output_dir=out)

# BenchBase — generates XML configs + shell scripts for a live database
bb_data(benchmark="tpcc", scale_factor=10).generate(output_dir=out)
bb_queries(benchmark="tpcc", terminals=8, duration=120).generate(output_dir=out)

Output layout

artifacts/
  tpch/data/sf_1/tables/   tpch/queries/
  tpcds/data/              tpcds/queries/
  tpcc/data/               tpcc/queries/
  tpcc_skew/data/          tpcc_skew/queries/
  job/data/                job/queries/
  ycsb/data/               ycsb/queries/
  dsb/data/                dsb/queries/
  pgbench/data/            pgbench/queries/
  benchbase/tpcc/data/     benchbase/tpcc/queries/

Each folder contains a *_manifest.json listing the generated files.

GenerationResult

generate() returns a GenerationResult:

result = tpch_data(scale_factor=1, mode="generate").generate(output_dir=out)
result.files      # list of generated file paths
result.metadata   # path to the manifest JSON

# Convert pipe-delimited .tbl / .dat to standard CSV (both kept on disk).
# Known TPC-H (8 tables) and TPC-DS (5 synthetic tables) get a proper
# header row, so the CSV is self-describing and usable directly by .drift().
csv_result = result.as_csv()

Second call reuses existing files automatically. Pass force=True to regenerate.

Applying drift to benchmark data

GenerationResult exposes .drift() and .drift_multi() to apply data drift directly — no manual schema extraction or generator setup needed.

Single-table drift:

from driftbench.data.tpch import TPCHData

result = TPCHData(scale_factor=1, source_dir="path/to/tbls").generate().as_csv()

# Inject outliers into lineitem.l_quantity
drifted = result.drift("lineitem", "outlier_injection", column="l_quantity", n=500)

# Skew the price/discount distribution
drifted = result.drift("lineitem", "value_skew",
                       columns=["l_extendedprice", "l_discount"], skewness=2)

drift() writes the drifted CSV to <output_dir>/<table>_<drift_type>.csv by default. Pass output_path= to override. Returns a new GenerationResult pointing at the drifted file.

Every .drift() call also emits a reproducible DriftSpec YAML (<output_stem>.driftspec.yaml) next to the CSV — kept out of result.files but recorded under the manifest's driftspec key. Running that YAML through driftbench.spec.core.run_all regenerates byte-identical output, so a Python-generated drift can be shared or automated as a spec without rework. The function-call path (fast, imperative) and the spec path (declarative, version-controllable, reproducible) are the same engine and produce identical results for the same seed and parameters.

Multi-table drift:

# FK relationships for tpch / job are wired automatically
drifted = result.drift_multi([
    {"op": "skew_column", "target": "lineitem", "column": "l_quantity",
     "fraction": 0.2, "skewness": 2},
    {"op": "delete_keys", "target": "orders", "key_column": "o_orderkey",
     "fraction": 0.05,
     "propagate": [{"relationship": "lineitem_orders", "policy": "drop"}]},
])

Pass relationships=[] or a custom list to override the built-in FK maps. Supported benchmarks with auto-wiring: tpch, job. tpcc and tpcc_skew require explicit relationship definitions because their joins use composite keys.

DriftSpec YAMLs — ready-to-run example specs for all five adapters are in driftspec/examples/:

tpch_lineitem_drift.yaml
tpcc_drift.yaml
job_drift.yaml
ycsb_drift.yaml
pgbench_drift.yaml

CLI Quickstart

# Validate a DriftSpec
python -m driftbench.cli validate-spec driftspec/examples/demo_data_single.yaml --json

# Dry-run (preview execution plan)
python -m driftbench.cli dry-run driftspec/examples/demo_data_single.yaml --json

# Execute
python -m driftbench.cli run-yaml driftspec/examples/demo_data_single.yaml

Python API

from driftbench import run_spec, trace_to_spec

run_spec("driftspec/examples/demo_data_single.yaml")
trace_to_spec("driftspec/trace_inputs/trace_data_mock.csv", "driftspec/generated/from_trace.yaml")

MCP Server

python3 -m driftbench_mcp.server

Core workflow via MCP: trace_to_spec → validate_spec → run_spec → list_outputs

Testing

python -m unittest discover -s test -p 'test_*.py' -v

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0b8 pre-release

May 20, 2026

0.1.0b7.post1 pre-release

May 15, 2026

0.1.0b7 pre-release

May 15, 2026

0.1.0b6 pre-release

May 14, 2026

0.1.0b5 pre-release

May 11, 2026

0.1.0b4 pre-release

May 10, 2026

0.1.0b3 pre-release

May 9, 2026

0.1.0b2 pre-release

May 9, 2026

0.1.0b1 pre-release

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftbench_db-0.1.0b8.tar.gz (192.4 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

driftbench_db-0.1.0b8-py3-none-any.whl (205.1 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file driftbench_db-0.1.0b8.tar.gz.

File metadata

Download URL: driftbench_db-0.1.0b8.tar.gz
Upload date: May 20, 2026
Size: 192.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for driftbench_db-0.1.0b8.tar.gz
Algorithm	Hash digest
SHA256	`bb0c880940667fccb932f57345a4adbbefb63981b1bc710733ca58d8f9400b25`
MD5	`210cae29dc47edc95c94cec448057fdb`
BLAKE2b-256	`75cc107d4015ed10d9ff8eda7bd3a87c3279e325dab44bee1a128ce1546270a0`

See more details on using hashes here.

File details

Details for the file driftbench_db-0.1.0b8-py3-none-any.whl.

File metadata

Download URL: driftbench_db-0.1.0b8-py3-none-any.whl
Upload date: May 20, 2026
Size: 205.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for driftbench_db-0.1.0b8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8dfd5ba52f4fbe96832ee409967d74de55854f400901c9543e22bdb02d3c8547`
MD5	`703db862ea54dcc198288ff40d55b2bf`
BLAKE2b-256	`f3300efc7b55d5f911717e3f7d143b2a4fa17bb02f09890cc7d1f8960d1c29f8`

See more details on using hashes here.

driftbench-db 0.1.0b8

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

DriftBench

Install

Benchmark Adapters (`driftbench.data`)

Generate data and queries

Output layout

GenerationResult

Applying drift to benchmark data

CLI Quickstart

Python API

MCP Server

Testing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

driftbench-db 0.1.0b8

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

DriftBench

Install

Benchmark Adapters (driftbench.data)

Generate data and queries

Output layout

GenerationResult

Applying drift to benchmark data

CLI Quickstart

Python API

MCP Server

Testing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Benchmark Adapters (`driftbench.data`)