Drift-aware database benchmarking — generate, share, and replay data and workload drift via DriftSpec.
Project description
DriftBench
DriftBench is a toolkit for generating and replaying data drift and workload drift with DriftSpec.
This README is intentionally focused on how to use the latest DriftBench.
Web Frontend
- Production site: driftbench.com
- Frontend source repo: driftbench-web
Install (Latest)
From PyPI (recommended)
python3 -m pip install -U driftbench-db
From source (latest main)
git clone https://github.com/Liuguanli/DriftBench.git
cd DriftBench
python3 -m pip install -e .
Verify installation
driftbench --help
driftbench-service --help
driftbench-mcp --help
CLI Quickstart
Use this flow for most users:
# 1) Validate a DriftSpec
python -m driftbench.cli validate-spec driftspec/examples/demo_data_single.yaml --json
# 2) Preview execution plan
python -m driftbench.cli dry-run driftspec/examples/demo_data_single.yaml --json
# 3) Execute
python -m driftbench.cli run-yaml driftspec/examples/demo_data_single.yaml
# 4) Inspect outputs
python -m driftbench.cli list-outputs --root output --glob "**/*" --limit 30 --json
Trace to DriftSpec
python -m driftbench.cli trace-to-spec \
driftspec/trace_inputs/trace_data_mock.csv \
driftspec/generated/from_trace.yaml \
--trace-type data
MCP Quickstart
Start MCP server (stdio):
python3 -m driftbench_mcp.server
Client config template:
docs/mcp_config_example.json
Minimal MCP guide:
docs/p0_mcp_server_minimal.md
Core MCP workflow:
trace_to_specvalidate_specrun_speclist_outputs
Spec sharing tools:
save_speclist_public_specsimport_spec_and_run
MCP Chat Demo (Codex / Claude Code)
After MCP is configured, the best pattern is to give your assistant a case type plus what change you want to simulate.
Case A: Data Drift (data changes)
Use when you care about data size/distribution changes (scaling, skew, outliers, updates).
[Prompt: Data Drift]
Read docs/p0_integration_quickstart.md.
I want a DATA drift case on <my dataset path>.
Goal: <e.g., scale 2x + stronger skew on column amount>.
Please use MCP tools to:
1) build a DriftSpec (or trace_to_spec if needed),
2) validate it,
3) run it,
4) list outputs.
Then summarize what data files were generated and what changed.
Case B: Workload Drift (query changes)
Use when you care about query behavior changes (predicate distribution, selectivity, structure, payload).
[Prompt: Workload Drift]
I want a WORKLOAD drift case.
Query goal: <e.g., predicates shift from uniform to city-focused, selectivity from 10% to 60%>.
Please create/run a spec via MCP and report:
- generated workload files,
- how query distribution/selectivity changed,
- suggested next workload variant.
Temporal Overlay (applied on top of Case A or B)
Temporal drift is usually an overlay, not a standalone base case. Use it to add time evolution (uniform / periodic / trend / long-tail) on top of data drift or workload drift.
[Prompt: Temporal Overlay]
Take my <DATA or WORKLOAD> drift case and add TEMPORAL pattern <uniform|periodic|trend|long_tail>.
Please run the MCP workflow and summarize:
1) generated spec path,
2) output artifacts,
3) expected temporal behavior in plain language,
4) how temporal behavior changes the base (data/workload) case.
What users should expect
- The assistant executes MCP tools in order (
trace_to_spec/build_spec->validate_spec->run_spec->list_outputs). - You get concrete artifact paths (generated YAML + output files).
- You get a short interpretation of what changed for your selected case (data/query), plus temporal overlay effects when requested.
- You usually get one or two suggested next iterations for deeper benchmarking.
Python API (Stable Entry Points)
Use top-level APIs instead of internal modules:
from driftbench import run_spec, trace_to_spec, get_schema_extractor
run_spec("driftspec/examples/demo_data_single.yaml")
trace_to_spec("driftspec/trace_inputs/trace_data_mock.csv", "driftspec/generated/from_trace.yaml")
Benchmark Objects (driftbench.data.xxx)
Use benchmark-specific objects to generate artifacts into a user-chosen directory.
1) Choose an output directory
output_dir is required. DriftBench will write files only under this directory.
2) Generate data and queries
from pathlib import Path
from driftbench.data.tpch import data as tpch_data, queries as tpch_queries
from driftbench.data.ycsb import data as ycsb_data, queries as ycsb_queries
from driftbench.data.tpcds import data as tpcds_data, queries as tpcds_queries
from driftbench.data.dsb import data as dsb_data, queries as dsb_queries
out = Path("./artifacts")
tpch_data(scale_factor=1).generate(output_dir=out)
tpch_queries(query_ids=[1, 3, 5], queries_per_template=2, mode="qgen").generate(output_dir=out)
# For very large scale factors, generate a server-side execution plan only.
tpch_data(scale_factor=1000, mode="plan").generate(output_dir=out)
ycsb_data(scale_factor=1).generate(output_dir=out)
ycsb_queries(workload="B").generate(output_dir=out)
tpcds_data(scale_factor=10).generate(output_dir=out)
tpcds_queries().generate(output_dir=out)
dsb_data(scale_factor=10).generate(output_dir=out)
dsb_queries().generate(output_dir=out)
3) Find generated files
Artifacts are written to:
<output_dir>/
tpch/
data/
queries/
ycsb/
data/
queries/
tpcds/
data/
queries/
dsb/
data/
queries/
Each generation creates a manifest (*_manifest.json) in its folder.
Use the manifest files field to see exactly which files were generated.
4) Programmatic path retrieval
generate() returns a GenerationResult with:
result.files: generated file pathsresult.metadata: manifest path
This is the recommended way to chain into downstream benchmarking scripts.
Where to find examples
- Example specs:
driftspec/examples/ - Trace inputs:
driftspec/trace_inputs/ - Integration tests with runnable fixtures:
test/fixtures/specs/
Core docs
- API boundary:
docs/p0_api_boundary_freeze.md - CLI/MCP command matrix:
docs/p0_mcp_command_matrix.md - Integration quickstart:
docs/p0_integration_quickstart.md - MCP examples script:
docs/p0_mcp_examples.sh - Release branch/tag policy:
docs/release_branch_policy.md
Testing
Run all tests:
python3 -m unittest discover -s test -p 'test_*.py' -v
License
MIT (see LICENSE).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file driftbench_db-0.1.0b4.tar.gz.
File metadata
- Download URL: driftbench_db-0.1.0b4.tar.gz
- Upload date:
- Size: 158.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f83d02af2a47e97bed401ae5d5fb80f8f4f5949dc51c1a2e79b12f23707c190a
|
|
| MD5 |
2464eaa269e50af95a11f86aae57dde5
|
|
| BLAKE2b-256 |
7566c58e70e9ac36e1f88f72c71201c25feb13608e18c1fc356d237bb47e8c80
|
Provenance
The following attestation bundles were made for driftbench_db-0.1.0b4.tar.gz:
Publisher:
publish.yml on Liuguanli/DriftBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
driftbench_db-0.1.0b4.tar.gz -
Subject digest:
f83d02af2a47e97bed401ae5d5fb80f8f4f5949dc51c1a2e79b12f23707c190a - Sigstore transparency entry: 1489310949
- Sigstore integration time:
-
Permalink:
Liuguanli/DriftBench@9891070e9a7f69ce89966ba695351f15c6618a14 -
Branch / Tag:
refs/tags/v0.1.0b4 - Owner: https://github.com/Liuguanli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9891070e9a7f69ce89966ba695351f15c6618a14 -
Trigger Event:
push
-
Statement type:
File details
Details for the file driftbench_db-0.1.0b4-py3-none-any.whl.
File metadata
- Download URL: driftbench_db-0.1.0b4-py3-none-any.whl
- Upload date:
- Size: 170.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cf7c235309e3a447a32cfd552810f0c8f94091107800477bd3a340154d2e531
|
|
| MD5 |
9a71d21efbf58c4d65e54242c0fb3e47
|
|
| BLAKE2b-256 |
e164490bba250014985899fb39d0f6c678bf3c1babf32070810c86618f17f452
|
Provenance
The following attestation bundles were made for driftbench_db-0.1.0b4-py3-none-any.whl:
Publisher:
publish.yml on Liuguanli/DriftBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
driftbench_db-0.1.0b4-py3-none-any.whl -
Subject digest:
5cf7c235309e3a447a32cfd552810f0c8f94091107800477bd3a340154d2e531 - Sigstore transparency entry: 1489311813
- Sigstore integration time:
-
Permalink:
Liuguanli/DriftBench@9891070e9a7f69ce89966ba695351f15c6618a14 -
Branch / Tag:
refs/tags/v0.1.0b4 - Owner: https://github.com/Liuguanli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9891070e9a7f69ce89966ba695351f15c6618a14 -
Trigger Event:
push
-
Statement type: