Skip to main content

Python CLI for TPC-H data generator

Project description

TPC-H Data Generator CLI

tpchgen-cli is a high-performance, parallel TPC-H data generator command line tool

This tool is more than 10x faster than the next fastest TPCH generator we know of (duckdb). On a 2023 Mac M3 Max laptop, it easily generates data faster than can be written to SSD. See BENCHMARKS.md for more details on performance and benchmarking.

  • See the tpchgen README.md for project details
  • Watch this awesome demo by @alamb to see tpchgen-cli in action
  • Read the companion blog post in the Datafusion blog to learn about the project's history
  • Try it yourself by following the instructions below

Install via pip

pip install tpchgen-cli

Install via Rust

Install Rust and compile

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
RUSTFLAGS='-C target-cpu=native' cargo install tpchgen-cli

Examples

# Scale Factor 10, all tables, in Apache Parquet format in the current directory
# (3.6GB, 8 files, 60M lineitem rows, in 5 seconds on a modern laptop)
tpchgen-cli -s 10 --format=parquet

# Scale Factor 10, all tables, in `tbl`(csv like) format in the `sf10` directory
# (10GB, 8 files, 60M lineitem rows)
tpchgen-cli -s 10 --output-dir sf10

# Scale Factor 1000, lineitem table, in Apache Parquet format in sf1000 directory, 
# 20 part(ititons), 100MB row groups
# (220GB, 20 files, 6B lineitem rows, 3.5 minutes on a modern laptop)
tpchgen-cli -s 1000 --tables lineitem --parts 20 --format=parquet --parquet-row-group-bytes=100000000 --output-dir sf1000

# Scale Factor 10, partition 2 and 3 of 10 in sf10 directory
#
# partitioned/
# ├── lineitem
# │   ├── lineitem.2.tbl
# │   └── lineitem.3.tbl
# └── orders
#    ├── orders.2.tbl
#    └── orders.3.tbl
#     
for PART in `seq 2 3`; do
  tpchgen-cli --tables lineitem,orders --scale-factor=10 --output-dir partitioned --parts 10 --part $PART
done

Performance

Scale Factor tpchgen-cli DuckDB DuckDB (proprietary)
1 0:02.24 0:12.29 0:10.68
10 0:09.97 1:46.80 1:41.14
100 1:14.22 17:48.27 16:40.88
1000 10:26.26 N/A (OOM) N/A (OOM)
  • DuckDB (proprietary) is the time required to create TPCH data using the proprietary DuckDB format
  • Creating Scale Factor 1000 data in DuckDB required 647 GB of memory, which is why it is not included in the table above.

Times to create TPCH tables in Parquet format using tpchgen-cli and duckdb for various scale factors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpchgen_cli-2.0.2.tar.gz (3.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tpchgen_cli-2.0.2-py3-none-win_amd64.whl (5.2 MB view details)

Uploaded Python 3Windows x86-64

tpchgen_cli-2.0.2-py3-none-win32.whl (4.6 MB view details)

Uploaded Python 3Windows x86

tpchgen_cli-2.0.2-py3-none-musllinux_1_2_x86_64.whl (6.1 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

tpchgen_cli-2.0.2-py3-none-musllinux_1_2_i686.whl (6.2 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

tpchgen_cli-2.0.2-py3-none-musllinux_1_2_armv7l.whl (6.0 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARMv7l

tpchgen_cli-2.0.2-py3-none-musllinux_1_2_aarch64.whl (5.6 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARM64

tpchgen_cli-2.0.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

tpchgen_cli-2.0.2-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (7.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ppc64le

tpchgen_cli-2.0.2-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (6.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

tpchgen_cli-2.0.2-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (6.0 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

tpchgen_cli-2.0.2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

tpchgen_cli-2.0.2-py3-none-macosx_11_0_arm64.whl (5.3 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

tpchgen_cli-2.0.2-py3-none-macosx_10_12_x86_64.whl (5.6 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file tpchgen_cli-2.0.2.tar.gz.

File metadata

  • Download URL: tpchgen_cli-2.0.2.tar.gz
  • Upload date:
  • Size: 3.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for tpchgen_cli-2.0.2.tar.gz
Algorithm Hash digest
SHA256 7181ffc929df1d813ce0d6e3f6db436bf615aab65fbe99138beb3090c59cd0e2
MD5 ad133272306098e96275e3dde59d4560
BLAKE2b-256 291181338d253c93ee580d9e3abe100a4f6c639d5ddc6608bca525b55a038b76

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 a532917f167d63fc9223f1f34887bbb80c3f0024ae78c4a2c24923beb680068b
MD5 4991260db6f0ba45a222a2757e279bf9
BLAKE2b-256 f7569225072e2b5a1f12a1a87a3d9dbdb6696cccbe313aa84ac404a304dfa7d2

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-win32.whl.

File metadata

  • Download URL: tpchgen_cli-2.0.2-py3-none-win32.whl
  • Upload date:
  • Size: 4.6 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-win32.whl
Algorithm Hash digest
SHA256 2735a81abcdd2f95301960e8522707e8a380c342c95112928065dd531f6a1bb9
MD5 cfa33a74bbf6d6e6336a6321f0d3fb54
BLAKE2b-256 dff786c5528e01f0a99450588b568f84079eb75bead9331d4ca0ade2407348be

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 fc14ce31de7560955bb0880ed46d23e2419afcf66494a775ff50edeed4388378
MD5 40b049fec09f49acb432e5affaef9ab8
BLAKE2b-256 089fa87904ae9382559218b03c6be6f162fe4384bcffd6d636dbd334d17e5162

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 c04caaa18b8eb224f5de88ee773e21afbf4f9723f71fdc77e7fb88a8fb3b1f69
MD5 ecb0f2cc2a7129206c4d5dc15b9792bc
BLAKE2b-256 f84dca26c4dc4dd0401eb8626baaff0637ee973a41505c7db9fb8a11261cb620

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 8a3758afd1c5e5379e86cd1e8c9629aa85fd4293532e769b01466da9e682b79c
MD5 313b1443e888fab4c68f9c4b7e00e126
BLAKE2b-256 a7aa4e97047c5b69bcb280ee7b1f4f9b2116f15a590332a9a8fcfc54f1f1c429

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 ea303d454bf44e0aae366e893cd3f393d8f89c1a94ddfc4ae79b7fd05ecb5d6c
MD5 b3dd8a21981a12aa235a40837b651e16
BLAKE2b-256 1d86a3d8faa707fad1a0736a39b2d007105fd5b351e3076871f6391a90b71488

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a1394361d426f35d3a448ca99948f74e333cf85dce98abb101e617273f0eba9e
MD5 cb0817fa9124b0abbb8380becda74fc2
BLAKE2b-256 a56dd5dc960f5e7763869e33dd904797e012c5c84609ba8eefa34c525f7e806c

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 28f5f8ee3c8eab1779bde8d763ad8cd852229217a377c5c33d271b1a5e9bdf36
MD5 d5125dec40299a869a11d57809790561
BLAKE2b-256 e233e9bdb3ba82e1d160e531850a058f718188a565b40c2edf5bdb070392ebd4

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 36fa88bb80b278364305f3047a2dfd95c3130bbb22ad9af4d1fcbbe8dc8bcfee
MD5 5087cc469c9a2a7af29cd27431cc93b3
BLAKE2b-256 ebb5f2b34d41f2f8584d9bf715555072fdd3813d2e8bb8981ead149fa79fa7e0

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 d71adedecb83cea4665d68ff3b75b797481f499324f6c22fa52b7033b26f598d
MD5 ea03c6005ff77c88fd2b9b2c3908c2d5
BLAKE2b-256 cd533e69e029a9fcec4d7cbcbe16cc7e5b4bcf1d518490286dd4932376a9046f

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 685dbb3b39ae764150048c5dd5ffb76314841b620cead30a5edd50d2d6ca7a6b
MD5 1cdd1c85b19e5b229500348048d0f143
BLAKE2b-256 482cdc9ce52f8e9f6d5919ed8427308a58038dd97d9ef433e76f65516a45c2f7

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f371ae82538aa7060db9d979051b211f19152ee8f82ab6d71dd3432ed17debbf
MD5 0bc672e6bd9d9ec19a8df4d8ef0987a7
BLAKE2b-256 90c8c87acf47b1397f7eea3643fc7924ad4abd7f4e4e5fe947d9b2eaab81a34c

See more details on using hashes here.

File details

Details for the file tpchgen_cli-2.0.2-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tpchgen_cli-2.0.2-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e61320f6177dd69f17250bb581f167b9548fd8d94c19f570e69191be75cc01c3
MD5 dcce4509ea7d25dac853378bbee7e53b
BLAKE2b-256 9988d9b6189e2a857c3e7106887c61dc5d6e9d2d324293c54b55fad9675d1913

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page