Skip to main content

Python CLI for TPC-H data generator

Project description

TPC-H Data Generator CLI

tpchgen-cli is a high-performance, parallel TPC-H data generator command line tool

This tool is more than 10x faster than the next fastest TPCH generator we know of (duckdb). On a 2023 Mac M3 Max laptop, it easily generates data faster than can be written to SSD. See BENCHMARKS.md for more details on performance and benchmarking.

  • See the tpchgen README.md for project details
  • Watch this awesome demo by @alamb to see tpchgen-cli in action
  • Read the companion blog post in the Datafusion blog to learn about the project's history
  • Try it yourself by following the instructions below

Install via pip

pip install tpchgen-cli

Install via uv

uv tool install tpchgen-cli 

Install via Rust

Install Rust and compile

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
RUSTFLAGS='-C target-cpu=native' cargo install tpchgen-cli

Examples

# Scale Factor 10, all tables, in Apache Parquet format in the current directory
# (3.6GB, 8 files, 60M lineitem rows, in 5 seconds on a modern laptop)
tpchgen-cli parquet -s 10

# Scale Factor 10, all tables, in `tbl`(csv like) format in the `sf10` directory
# (10GB, 8 files, 60M lineitem rows)
# Note: `tpchgen-cli tbl` also works explicitly
tpchgen-cli -s 10 --output-dir sf10

# Scale Factor 1000, lineitem table, in Apache Parquet format in sf1000 directory, 
# 20 part(itions), 100MB row groups
# (220GB, 20 files, 6B lineitem rows, 3.5 minutes on a modern laptop)
tpchgen-cli parquet -s 1000 --tables lineitem --parts 20 --row-group-bytes=100000000 --output-dir sf1000

# Scale Factor 10, partition 2 and 3 of 10 in sf10 directory
#
# partitioned/
# ├── lineitem
# │   ├── lineitem.2.tbl
# │   └── lineitem.3.tbl
# └── orders
#    ├── orders.2.tbl
#    └── orders.3.tbl
#     
for PART in `seq 2 3`; do
  tpchgen-cli --tables lineitem,orders --scale-factor=10 --output-dir partitioned --parts 10 --part $PART
done

By default tpchgen-cli shows a per-table progress bar on stderr while data is generated. Pass --no-progress to disable it (it is also disabled automatically when --quiet is set, when --stdout is used, or when stderr is not a terminal, e.g. in CI logs).

Performance

Scale Factor tpchgen-cli DuckDB DuckDB (proprietary)
1 0:02.24 0:12.29 0:10.68
10 0:09.97 1:46.80 1:41.14
100 1:14.22 17:48.27 16:40.88
1000 10:26.26 N/A (OOM) N/A (OOM)
  • DuckDB (proprietary) is the time required to create TPCH data using the proprietary DuckDB format
  • Creating Scale Factor 1000 data in DuckDB required 647 GB of memory, which is why it is not included in the table above.

Times to create TPCH tables in Parquet format using tpchgen-cli and duckdb for various scale factors.

Deprecation Notice

--format, --parquet-compression, and --parquet-row-group-bytes are deprecated as of v3.x and will be removed in v4.0.0. Use subcommands instead:

# Before
tpchgen-cli --format=parquet --parquet-compression=ZSTD(1) -s 10

# After
tpchgen-cli parquet --compression=ZSTD(1) -s 10

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpchgen_cli-3.0.0.tar.gz (3.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tpchgen_cli-3.0.0-py3-none-win_amd64.whl (3.8 MB view details)

Uploaded Python 3Windows x86-64

tpchgen_cli-3.0.0-py3-none-win32.whl (3.5 MB view details)

Uploaded Python 3Windows x86

tpchgen_cli-3.0.0-py3-none-musllinux_1_2_x86_64.whl (3.9 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

tpchgen_cli-3.0.0-py3-none-musllinux_1_2_i686.whl (3.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

tpchgen_cli-3.0.0-py3-none-musllinux_1_2_armv7l.whl (3.9 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARMv7l

tpchgen_cli-3.0.0-py3-none-musllinux_1_2_aarch64.whl (3.7 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARM64

tpchgen_cli-3.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

tpchgen_cli-3.0.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (4.0 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ppc64le

tpchgen_cli-3.0.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (3.9 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

tpchgen_cli-3.0.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (3.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

tpchgen_cli-3.0.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

tpchgen_cli-3.0.0-py3-none-macosx_11_0_arm64.whl (3.5 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

tpchgen_cli-3.0.0-py3-none-macosx_10_12_x86_64.whl (3.7 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file tpchgen_cli-3.0.0.tar.gz.

File metadata

  • Download URL: tpchgen_cli-3.0.0.tar.gz
  • Upload date:
  • Size: 3.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.14.1

File hashes

Hashes for tpchgen_cli-3.0.0.tar.gz
Algorithm Hash digest
SHA256 1892f1e466ae0fee051129e1cf182d7d74d637b81d09f4081e4015b0e661aa0a
MD5 2393620c01ff28e3f3d97c7484183f8b
BLAKE2b-256 487a812de426fe6286c185d57330f9220d5735427542cf3bc455933e948d2757

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 e3f276a75ead873f025120357baf4042f976a43070e2b3b63d147641664e9491
MD5 de82cb278d92b0a501240e02a78b4941
BLAKE2b-256 e72b12e1eed96b22e687f5fbaebff77371350f2af5ed1fda56bf2436bf712fc7

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-win32.whl.

File metadata

  • Download URL: tpchgen_cli-3.0.0-py3-none-win32.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.14.1

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 ebff7beaaf3ac0f7fedadb559477ff5786e5c3c8a1153d467cdc4adb5df17ef5
MD5 b2cf9d13362ec9aa340f05451fdb57e2
BLAKE2b-256 0953f734e84e134c0f98cffa2a08a0f5adf44da8cb7e66d871fc11768c29eee5

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0ff4951691b388b195129b7b847222fa598960e63739a5d2feac182609bbf8a7
MD5 cf7f48856a4cfb43b538ca7b9aac1940
BLAKE2b-256 d6673c1400e1265ba1f0da1a66d1d1f428ef83d6e43f9b39f45618beb512eda5

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 c57f4c491640e74510332946a94bd35ba24cc265dd7dd2690ee8b9f8eb8541a4
MD5 b727fc81ce89f0bfd92c437a6ec81a3d
BLAKE2b-256 d594859b1a8c908e86a2eae8431aaddd6835f5902f626af82ac63865b0e13b65

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 3714e43439e6d0144e6c202aee9db46044ef07167823f693153e84c23d00f9ea
MD5 903c3cf7c1c905a238a74c463a3c402c
BLAKE2b-256 9f4582599b7965bc74953934a01087d3e6b12ee3c7e43f7e3d1b29ab2fd45a5b

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 26147d5ad081541651599abf5a7c050fa9d5771f811a66821cb92cd0a101a071
MD5 e7be7fc30512b6421ba7e88d84bdb9c1
BLAKE2b-256 8d20963cceefa7e86a93f141a633b3c10de13d5aa9224fc21f8a3c4daa9e3cfe

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8eaed46bae8974e00214a6459b7927921966353c21cf53cc8efdfa9fd99bb2dc
MD5 9deb2f816394f4be19170cc8f521e036
BLAKE2b-256 23dbe5e5d550a2da3b4952db5e79d01ba7948e9feba58067cc8ebfd2c4232e39

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 395de548fb1a5e656d6e68ef935bdedb091f0112bff72636b19295d33be7a0b9
MD5 39a18aa9f6e40e6c46264ae1b78dca24
BLAKE2b-256 42ef87e4b15a1370bb9585758befdd0ec0e760fa3f5fec8f9bd7fa7084176a31

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 f68c4e5422d26ecccb218dde6bb0b2289da8e373a3a0f3c6f7850e2e309aee22
MD5 06291f754490ef54f09f0327eb4d7ef7
BLAKE2b-256 d1d9c3c623fd42ab6f3a1700797a2ce393ee929e3f1fc286f23ca2265246896f

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 234b086888b69260cfb958546fa1826a27d9480172723023812240f252471bc5
MD5 4a437ea344d2044a94f27a16ed08f035
BLAKE2b-256 094a14aae217065e67c4340f8342b1ba86b9de78f2af85fbde3b0f19c617139f

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 788445305b7632845da277322691f3f4d377e2426aac4103c68115b120c7a645
MD5 7239725a396d618ce1e96a74f176ff5d
BLAKE2b-256 fab92d78d82fbc20a0aa921470df3171737117dde15a369f4b18560e94b5b9c4

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0f6603c82f096241c96a851778d4633797ee732bd1e33e443c8142ec60015b17
MD5 d4988e21f8a9c9e5a54ecb31c65f78a1
BLAKE2b-256 f7b6e31e5796f50f39f9a3e407bbcaabcedccc80672bf5b372531bcebe270a58

See more details on using hashes here.

File details

Details for the file tpchgen_cli-3.0.0-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-3.0.0-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c388166979b7132a3073047149704d78e5d112c5760ca2d289b395ea1ef16389
MD5 34aa30ecc06e42c454953351c4d9e24d
BLAKE2b-256 c9e743f1b87c6bfb7ac8c42a01857a7ad2cbf1f8865bc9cb8909d0c9e4bdda6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page