Skip to main content

Python CLI for TPC-H data generator

Project description

TPC-H Data Generator CLI

tpchgen-cli is a high-performance, parallel TPC-H data generator command line tool

This tool is more than 10x faster than the next fastest TPCH generator we know of (duckdb). On a 2023 Mac M3 Max laptop, it easily generates data faster than can be written to SSD. See BENCHMARKS.md for more details on performance and benchmarking.

  • See the tpchgen README.md for project details
  • Watch this awesome demo by @alamb to see tpchgen-cli in action
  • Read the companion blog post in the Datafusion blog to learn about the project's history
  • Try it yourself by following the instructions below

Install via pip

pip install tpchgen-cli

Install via Rust

Install Rust and compile

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
RUSTFLAGS='-C target-cpu=native' cargo install tpchgen-cli

Examples

# Scale Factor 10, all tables, in Apache Parquet format in the current directory
# (3.6GB, 8 files, 60M lineitem rows, in 5 seconds on a modern laptop)
tpchgen-cli -s 10 --format=parquet

# Scale Factor 10, all tables, in `tbl`(csv like) format in the `sf10` directory
# (10GB, 8 files, 60M lineitem rows)
tpchgen-cli -s 10 --output-dir sf10

# Scale Factor 1000, lineitem table, in Apache Parquet format in sf1000 directory, 
# 20 part(ititons), 100MB row groups
# (220GB, 20 files, 6B lineitem rows, 3.5 minutes on a modern laptop)
tpchgen-cli -s 1000 --tables lineitem --parts 20 --format=parquet --parquet-row-group-bytes=100000000 --output-dir sf1000

# Scale Factor 10, partition 2 and 3 of 10 in sf10 directory
#
# partitioned/
# ├── lineitem
# │   ├── lineitem.2.tbl
# │   └── lineitem.3.tbl
# └── orders
#    ├── orders.2.tbl
#    └── orders.3.tbl
#     
for PART in `seq 2 3`; do
  tpchgen-cli --tables lineitem,orders --scale-factor=10 --output-dir partitioned --parts 10 --part $PART
done

Performance

Scale Factor tpchgen-cli DuckDB DuckDB (proprietary)
1 0:02.24 0:12.29 0:10.68
10 0:09.97 1:46.80 1:41.14
100 1:14.22 17:48.27 16:40.88
1000 10:26.26 N/A (OOM) N/A (OOM)
  • DuckDB (proprietary) is the time required to create TPCH data using the proprietary DuckDB format
  • Creating Scale Factor 1000 data in DuckDB required 647 GB of memory, which is why it is not included in the table above.

Times to create TPCH tables in Parquet format using tpchgen-cli and duckdb for various scale factors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpchgen_cli-2.0.1.tar.gz (3.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tpchgen_cli-2.0.1-py3-none-win_amd64.whl (4.7 MB view details)

Uploaded Python 3Windows x86-64

tpchgen_cli-2.0.1-py3-none-win32.whl (4.3 MB view details)

Uploaded Python 3Windows x86

tpchgen_cli-2.0.1-py3-none-musllinux_1_2_x86_64.whl (5.6 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

tpchgen_cli-2.0.1-py3-none-musllinux_1_2_i686.whl (5.7 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

tpchgen_cli-2.0.1-py3-none-musllinux_1_2_armv7l.whl (5.5 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARMv7l

tpchgen_cli-2.0.1-py3-none-musllinux_1_2_aarch64.whl (5.1 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARM64

tpchgen_cli-2.0.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

tpchgen_cli-2.0.1-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (6.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ppc64le

tpchgen_cli-2.0.1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (6.0 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

tpchgen_cli-2.0.1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (5.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

tpchgen_cli-2.0.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.2 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

tpchgen_cli-2.0.1-py3-none-macosx_11_0_arm64.whl (4.7 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

tpchgen_cli-2.0.1-py3-none-macosx_10_12_x86_64.whl (5.2 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file tpchgen_cli-2.0.1.tar.gz.

File metadata

  • Download URL: tpchgen_cli-2.0.1.tar.gz
  • Upload date:
  • Size: 3.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for tpchgen_cli-2.0.1.tar.gz
Algorithm Hash digest
SHA256 b272cca794d39f59857ba90dab9b1aab87a53fb3584cfbe54eee03d77eaa5cd8
MD5 365e6330ad605882382d1e4efa5782af
BLAKE2b-256 e8a605389d558764579826c796133b16e8b65ca6550f1d435d1b523ea8a7b927

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 86f0904b4c4051ea503735aab7dc30b9868bb1a7afb4ac873a8ff1d51fca55cf
MD5 94166a1bfce074172595bb25c6e345cd
BLAKE2b-256 c6c3aec299aed1b2318043c7fc76441bed0bacb3f37ed7bd75220d1f30d9319f

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-win32.whl.

File metadata

  • Download URL: tpchgen_cli-2.0.1-py3-none-win32.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-win32.whl
Algorithm Hash digest
SHA256 3e7c2a31818e4ae175fcdb78bbe7307fb9274ed6733020404f0f72d446bda5a7
MD5 917652de43867c4b049219562a46f588
BLAKE2b-256 f364a430a83d2f52bc6cc34120f0b88d0377370bc4c13c21d9c06525519c6506

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 faf3a1b240f0405553c15c7a1b95c98c151b2fde6ccd704b93afadcde87c4b93
MD5 44382a22ef5b539dfca971481deb261b
BLAKE2b-256 6a94cbb6ff63c090b38252d061f10cc6288bf0a2cda6f40cf914cc56733c192b

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 d322711e2310d053dd73580e9eb56a696de7b051aa934720e9248058c0b4334f
MD5 4c900ef92f5df1e46ab2144062351e76
BLAKE2b-256 ac46ff7567a72f8545c55234d1abeb819ee972b26b291336fd5722695dba37b7

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 12dca37d8a46f12c4f7115cfc02e979968090d8c4aed732d02d32ee151a5fc6d
MD5 9245c820028b38b72c6597ff630ba0ee
BLAKE2b-256 f3c6d4be691d709dadc22369e0286738c2d1ae85d2d15e8e36a3c276c8751a6b

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 157480e8276c527f0a07a1382391c9701cfcaf37c43193efde142fc72bdef01c
MD5 e24d4c2d1d9e4728515e5b2c94232b65
BLAKE2b-256 636cd2f1f2a20d1b585e9da163f019952910e9ef0d1dc2581f77152d0e4a2a9f

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 96c678e4c69076d12d6aac5bced3c394c55602726c74e27e0c1059b54befc0e4
MD5 dae7aac642e4ffce4fa314834bc51c66
BLAKE2b-256 5ad9a564bd118e32495a80764abf8b583abefe1e97fa9e879312e56de83e2d7e

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 78049d7dd186b94b2a62ffcdb8b9071af540e767fc12631406577065b79afb0e
MD5 bb3af5df51a9341961d4391a03e7e05a
BLAKE2b-256 dcc7eb0242bb688148ca946c8911eb3b53402997aa16a89973107d4444f865d2

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 d55368fef6e39e3631d3c40ede3f9eff22aceee9edecfaa7422e2c04f775d875
MD5 e009de9c3f63a942346eaa442b50b00c
BLAKE2b-256 14094a11f20dfaeec430553c04608f600b526e3dff0cd735bb2b19d8b65eba93

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 145b7b7149c41c78495c52352b58a322b1246c8ccd663f442360de61d2499f6f
MD5 2082dea544a624f0b56b3eaf51241489
BLAKE2b-256 875d6c3bb128fbcfed26ee6ed6837a1f169c241a2afdd4b5fc3d4e08a6093843

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 433f6c21d821e0a5caa1d9f074b97861ef6dadaa2391e525d98a7ed07be7d9cd
MD5 49638858464ebd9289b9a93d0a983640
BLAKE2b-256 75646a041782ae8fd0bbea3d01d6858c2ec201308db2a56668575853ac2f295f

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 425f8627e82c16766beb1d26d6ac1de2ab085effdea403b99ed00fe02852eac3
MD5 4e64a43b3818fbc52a1fb6618cc5f1db
BLAKE2b-256 4a02ab3ccc93e69ddbf3b59209efb7ab0fbe104a9a07ddf4cc5ccd882d549483

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.1-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b27c9494f7220b3ab51b9b8c02708eca53e05f0cd42ec4ae3e955624ad77d306
MD5 f00297b80801f15b3af947bd2b5b1763
BLAKE2b-256 56faef2befbd74d5e64eab35b882f6f0ba9da6772e448b2d35c6b7602a72a376

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page