Generate reproducible benchmark datasets with controlled data and workload drift (TPC-H, TPC-DS, YCSB, DSB) — CLI and MCP, no external tools required.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

guanli

These details have not been verified by PyPI

Project description

DriftBench logo

DriftBench

DriftBench generates benchmark datasets where data and queries change in controlled ways — simulating the distribution shifts, skew, and workload changes that real database systems encounter over time. You give it a DriftSpec (a YAML file describing what should change and how much), and it produces data files and SQL workloads ready for benchmarking.

Works via CLI (driftbench-db) or MCP (Claude / Codex assistant). Supports TPC-H, TPC-DS, YCSB, and DSB out of the box — no external data-generation tools required.

Who this is for:

Researcher — reproduce drift scenarios, run ablations, compare estimators under shift.
Database Vendor / Performance Team — run drift regression checks across benchmark targets.
New User — start from a working example and see output in under 5 minutes.

Start Here (5-minute path)

pip install -U driftbench-db
driftbench-db validate-spec driftspec/examples/demo_data_single.yaml --json
driftbench-db dry-run driftspec/examples/demo_data_single.yaml --json
driftbench-db run-yaml driftspec/examples/demo_data_single.yaml
driftbench-db list-outputs --root output --glob "**/*" --limit 20 --json

What you get: a folder under output/ containing generated data files, a SQL workload, and a manifest (*_manifest.json) listing every artifact path.

Stuck? See Troubleshooting below.

Version-by-version updates and service coverage: CHANGELOG.md
Production site: driftbench.com
Frontend source: driftbench-web

Quick Paths by Role

Researcher

pip install -U driftbench-db
driftbench-db validate-spec driftspec/examples/demo_data_single.yaml --json
driftbench-db run-yaml driftspec/examples/demo_data_single.yaml

→ Outputs drift datasets + workload files ready for estimator evaluation.

Database Vendor / Performance Team

pip install -U driftbench-db
driftbench-db orchestrate \
  --spec driftspec/examples/demo_data_single.yaml \
  --targets driftspec/examples/adapters/benchmark_targets_mvp.yaml \
  --manifest-out output/orchestrate_manifest.json --json
driftbench-db list-outputs --root output --glob "**/*" --limit 30 --json

→ Runs one DriftSpec across multiple benchmark targets; outputs per-target manifests.

New User

pip install -U driftbench-db
driftbench-db --help
driftbench-db validate-spec driftspec/examples/demo_data_single.yaml --json

→ Validates the example spec and shows you what a passing spec looks like before running anything.

Install (Latest)

From PyPI (recommended)

python3 -m pip install -U driftbench-db

From source (latest `main`)

git clone https://github.com/Liuguanli/DriftBench.git
cd DriftBench
python3 -m pip install -e .

Verify installation

driftbench --help
driftbench-service --help
driftbench-mcp --help

CLI Quickstart

Use this flow for most users:

# 1) Validate a DriftSpec
driftbench-db validate-spec driftspec/examples/demo_data_single.yaml --json

# 2) Preview execution plan
driftbench-db dry-run driftspec/examples/demo_data_single.yaml --json

# 3) Execute
driftbench-db run-yaml driftspec/examples/demo_data_single.yaml

# 4) Inspect outputs
driftbench-db list-outputs --root output --glob "**/*" --limit 30 --json

Trace to DriftSpec

driftbench-db trace-to-spec \
  driftspec/trace_inputs/trace_data_mock.csv \
  driftspec/generated/from_trace.yaml \
  --trace-type data

Orchestrate Across Benchmark Targets (MVP)

Use one DriftSpec across multiple benchmark targets defined in benchmark_target.yaml.

driftbench-db orchestrate \
  --spec driftspec/examples/demo_data_single.yaml \
  --targets driftspec/examples/adapters/benchmark_targets_mvp.yaml \
  --manifest-out output/orchestrate_manifest.json \
  --json

Execute setup/run commands for each target:

driftbench-db orchestrate \
  --spec driftspec/examples/demo_data_single.yaml \
  --targets driftspec/examples/adapters/benchmark_targets_mvp.yaml \
  --manifest-out output/orchestrate_manifest.json \
  --execute \
  --json

Bootstrap Dataset (download/copy + checksum + schema extract)

Bootstrap from preset, local path, or URL:

driftbench-db bootstrap dataset \
  --source census_original \
  --output-dir output/bootstrap/datasets \
  --json

With checksum verification:

driftbench-db bootstrap dataset \
  --source /path/to/my_dataset.csv \
  --output-dir output/bootstrap/datasets \
  --checksum sha256:<hex> \
  --json

MCP Quickstart

Start MCP server (stdio):

python3 -m driftbench_mcp.server

Client config template:

docs/mcp_config_example.json

Minimal MCP guide:

docs/p0_mcp_server_minimal.md

Core MCP workflow:

trace_to_spec
validate_spec
run_spec
list_outputs

Spec sharing tools:

save_spec
list_public_specs
import_spec_and_run

MCP Chat Demo (Codex / Claude Code)

After MCP is configured, the best pattern is to give your assistant a case type plus what change you want to simulate.

Case A: Data Drift (data changes)

Use when you care about data size/distribution changes (scaling, skew, outliers, updates).

[Prompt: Data Drift]
Read docs/p0_integration_quickstart.md.
I want a DATA drift case on <my dataset path>.
Goal: <e.g., scale 2x + stronger skew on column amount>.
Please use MCP tools to:
1) build a DriftSpec (or trace_to_spec if needed),
2) validate it,
3) run it,
4) list outputs.
Then summarize what data files were generated and what changed.

Case B: Workload Drift (query changes)

Use when you care about query behavior changes (predicate distribution, selectivity, structure, payload).

[Prompt: Workload Drift]
I want a WORKLOAD drift case.
Query goal: <e.g., predicates shift from uniform to city-focused, selectivity from 10% to 60%>.
Please create/run a spec via MCP and report:
- generated workload files,
- how query distribution/selectivity changed,
- suggested next workload variant.

Temporal Overlay (applied on top of Case A or B)

Temporal drift is usually an overlay, not a standalone base case. Use it to add time evolution (uniform / periodic / trend / long-tail) on top of data drift or workload drift.

[Prompt: Temporal Overlay]
Take my <DATA or WORKLOAD> drift case and add TEMPORAL pattern <uniform|periodic|trend|long_tail>.
Please run the MCP workflow and summarize:
1) generated spec path,
2) output artifacts,
3) expected temporal behavior in plain language,
4) how temporal behavior changes the base (data/workload) case.

What users should expect

The assistant executes MCP tools in order (trace_to_spec/build_spec -> validate_spec -> run_spec -> list_outputs).
You get concrete artifact paths (generated YAML + output files).
You get a short interpretation of what changed for your selected case (data/query), plus temporal overlay effects when requested.
You usually get one or two suggested next iterations for deeper benchmarking.

Python API (Stable Entry Points)

Use top-level APIs instead of internal modules:

from driftbench import run_spec, trace_to_spec, get_schema_extractor

run_spec("driftspec/examples/demo_data_single.yaml")
trace_to_spec("driftspec/trace_inputs/trace_data_mock.csv", "driftspec/generated/from_trace.yaml")

Benchmark Objects (`driftbench.data.xxx`)

Use benchmark-specific objects to generate artifacts into a user-chosen directory.

1) Choose an output directory

output_dir is required. DriftBench will write files only under this directory.

2) Generate data and queries

from pathlib import Path
from driftbench.data.tpch import data as tpch_data, queries as tpch_queries
from driftbench.data.ycsb import data as ycsb_data, queries as ycsb_queries
from driftbench.data.tpcds import data as tpcds_data, queries as tpcds_queries
from driftbench.data.dsb import data as dsb_data, queries as dsb_queries

out = Path("./artifacts")

tpch_data(scale_factor=1).generate(output_dir=out)
tpch_queries(query_ids=[1, 3, 5], queries_per_template=2, mode="qgen").generate(output_dir=out)
tpch_queries().generate(output_dir=out)  # all query ids

# For very large scale factors, generate a server-side execution plan only.
tpch_data(scale_factor=1000, mode="plan").generate(output_dir=out)

ycsb_data(scale_factor=1).generate(output_dir=out)
ycsb_queries(workload="B").generate(output_dir=out)

tpcds_data(scale_factor=10).generate(output_dir=out)            # any scale (synthetic local generation)
tpcds_queries().generate(output_dir=out)                        # all query ids (1..99)
tpcds_queries(query_ids=[1, 5, 42]).generate(output_dir=out)    # selected query ids

dsb_data(scale_factor=10).generate(output_dir=out)
dsb_queries().generate(output_dir=out)

tpch_data(scale_factor=...) default mode is auto:

try local .tbl source (if available);
if missing and scale_factor == 1, try built-in Python download path;
otherwise fall back to integrated synthetic generation.

This means users can call the Python API directly without manually running external download commands.

3) Find generated files

Artifacts are written to:

<output_dir>/
  tpch/
    data/
    queries/
  ycsb/
    data/
    queries/
  tpcds/
    data/
    queries/
  dsb/
    data/
    queries/

Each generation creates a manifest (*_manifest.json) in its folder.
Use the manifest files field to see exactly which files were generated.

4) Programmatic path retrieval

generate() returns a GenerationResult with:

result.files: generated file paths
result.metadata: manifest path

This is the recommended way to chain into downstream benchmarking scripts.

Where to find examples

Example specs: driftspec/examples/
Trace inputs: driftspec/trace_inputs/
Integration tests with runnable fixtures: test/fixtures/specs/

Core docs

API boundary: docs/p0_api_boundary_freeze.md
CLI/MCP command matrix: docs/p0_mcp_command_matrix.md
Integration quickstart: docs/p0_integration_quickstart.md
MCP examples script: docs/p0_mcp_examples.sh
Release branch/tag policy: docs/release_branch_policy.md

Troubleshooting

Symptom	Likely cause	Fix
`command not found: driftbench-db`	Entry point not on PATH	Run `pip install -U driftbench-db` again; check your venv is active
`[VALIDATION ERROR] Spec root must be a YAML mapping`	YAML file is a list or scalar, not a mapping	Open the spec file and ensure the top level is `type: ...` / `variables: ...`
`[VALIDATION ERROR] Invalid 'type': expected mapping`	`type:` field is a plain string, not a nested object	Use `type: {family: ..., category: ..., subtype: ...}`
`[VALIDATION ERROR] No such file or directory`	Wrong spec path	Check the path with `ls driftspec/examples/` and retry
`Missing 'type' in spec`	Spec file is empty or missing the `type` key	Add `type:` block; see `driftspec/examples/demo_data_single.yaml` for reference
Output folder is empty after `run-yaml`	Spec has no enabled variables	Ensure at least one variable in `variables:` is not commented out

For anything not listed here, run with --json to get a machine-readable error, then check docs/p0_known_issues.md.

Testing

Run all tests:

python3 -m unittest discover -s test -p 'test_*.py' -v

License

MIT (see LICENSE).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

guanli

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.0b7.post1 pre-release

May 15, 2026

0.1.0b7 pre-release

May 15, 2026

This version

0.1.0b6 pre-release

May 14, 2026

0.1.0b5 pre-release

May 11, 2026

0.1.0b4 pre-release

May 10, 2026

0.1.0b3 pre-release

May 9, 2026

0.1.0b2 pre-release

May 9, 2026

0.1.0b1 pre-release

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftbench_db-0.1.0b6.tar.gz (169.2 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

driftbench_db-0.1.0b6-py3-none-any.whl (182.9 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file driftbench_db-0.1.0b6.tar.gz.

File metadata

Download URL: driftbench_db-0.1.0b6.tar.gz
Upload date: May 14, 2026
Size: 169.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for driftbench_db-0.1.0b6.tar.gz
Algorithm	Hash digest
SHA256	`24cf2f9034534a9ec517151b3d024d32cc49fc7ebfb02b77992e74b747007eef`
MD5	`3e810ff323fd36dbd73ca23ce3d684ec`
BLAKE2b-256	`5778a7eb03e5b152afe6f1b0c1ffae36ed33d544b2da577ac0f8bd29a658705d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for driftbench_db-0.1.0b6.tar.gz:

Publisher: publish.yml on Liuguanli/DriftBench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: driftbench_db-0.1.0b6.tar.gz
- Subject digest: 24cf2f9034534a9ec517151b3d024d32cc49fc7ebfb02b77992e74b747007eef
- Sigstore transparency entry: 1536614982
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: Liuguanli/DriftBench@897719076ee19c6d3cccde68d419ab23c11b2db6
- Branch / Tag: refs/tags/v0.1.0b6
- Owner: https://github.com/Liuguanli
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@897719076ee19c6d3cccde68d419ab23c11b2db6
- Trigger Event: push

File details

Details for the file driftbench_db-0.1.0b6-py3-none-any.whl.

File metadata

Download URL: driftbench_db-0.1.0b6-py3-none-any.whl
Upload date: May 14, 2026
Size: 182.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for driftbench_db-0.1.0b6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ff5f8f4547e3467ba5cb35f06f59e6530b1df922feabb86f515c35e3bd95def9`
MD5	`51f55023178bb193942898a3652e18e7`
BLAKE2b-256	`d8297b4835249cf1fc74a66ee549fd1fb79567fe11ce6d341b8a6a5c5a267b97`

See more details on using hashes here.

Provenance

The following attestation bundles were made for driftbench_db-0.1.0b6-py3-none-any.whl:

Publisher: publish.yml on Liuguanli/DriftBench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: driftbench_db-0.1.0b6-py3-none-any.whl
- Subject digest: ff5f8f4547e3467ba5cb35f06f59e6530b1df922feabb86f515c35e3bd95def9
- Sigstore transparency entry: 1536615142
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: Liuguanli/DriftBench@897719076ee19c6d3cccde68d419ab23c11b2db6
- Branch / Tag: refs/tags/v0.1.0b6
- Owner: https://github.com/Liuguanli
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@897719076ee19c6d3cccde68d419ab23c11b2db6
- Trigger Event: push

driftbench-db 0.1.0b6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

DriftBench

Start Here (5-minute path)

Quick Paths by Role

Researcher

Database Vendor / Performance Team

New User

Install (Latest)

From PyPI (recommended)

From source (latest main)

Verify installation

CLI Quickstart

Trace to DriftSpec

Orchestrate Across Benchmark Targets (MVP)

Bootstrap Dataset (download/copy + checksum + schema extract)

MCP Quickstart

MCP Chat Demo (Codex / Claude Code)

Case A: Data Drift (data changes)

Case B: Workload Drift (query changes)

Temporal Overlay (applied on top of Case A or B)

What users should expect

Python API (Stable Entry Points)

Benchmark Objects (driftbench.data.xxx)

1) Choose an output directory

2) Generate data and queries

3) Find generated files

4) Programmatic path retrieval

Where to find examples

Core docs

Troubleshooting

Testing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

From source (latest `main`)

Benchmark Objects (`driftbench.data.xxx`)