Skip to main content

IO Bench is a library designed to benchmark the performance of standard flat file formats and partitioning schemes.

Project description

Documentation Status codecov DeepSource

IOBench Quick Start Guide

Generating Sample Data

To generate sample data, initialize the IOBench object with the path to the source CSV file and call the generate_sample method:

from io_bench import IOBench

bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars', 'parquet_arrow', 'parquet_fast', 'feather', 'feather_arrow'])
bench.generate_sample(records=100000) # default value

NOTE: source_file behavior is contextual; providing a desired name for a sample file then calling generate_sample will create the file. Otherwise a valid path to an existing file must be provided.

Converting Data to Partitioned Formats

Convert the generated CSV data to partitioned formats (Avro, Parquet, Feather) will automatically partition on default column selection chunks if not defined.

bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})

Running Benchmarks

NOTE: Partition is stateful per bench object. If partition is not called manually it will automatically be called on the first run only assuming a valid source file exists.

Without Column Selection

Run benchmarks without column selection:

benchmarks_no_select = bench.run(suffix='_no_select')

With Column Selection

Run benchmarks with column selection:

columns = ['Region', 'Country', 'Total Cost']
benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')

Generating Reports

Combine results and generate the final report:

all_benchmarks = benchmarks_no_select + benchmarks_column_select
io_bench.report(all_benchmarks, report_dir='./result')

Full Example

Here is a full example of using IOBench:

from io_bench import IOBench

def main() -> None:
    # Initialize the IOBench object with runs and parsers
    bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars'])

    # Generate sample data - (optional)
    bench.generate_sample()

    # Convert the source file to partitioned formats - (optional)
    bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})

    # Run benchmarks without column selection
    benchmarks_no_select = bench.run(suffix='_no_select')

    # Run benchmarks with column selection
    columns = ['Region', 'Country', 'Total Cost']
    benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')

    # Combine results and generate the final report
    all_benchmarks = benchmarks_no_select + benchmarks_column_select
    bench.report(all_benchmarks, report_dir='./result')

if __name__ == "__main__":
    main()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

io_bench-0.1.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

io_bench-0.1.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file io_bench-0.1.0.tar.gz.

File metadata

  • Download URL: io_bench-0.1.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for io_bench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7d603a4385b09a001a784f9e7a7eb6e5ede9bc144c92640ce8ebd5e199e0d55b
MD5 8a2e0c72f5014eca756f92fd9cdeced3
BLAKE2b-256 996a767e68dce50e8bd0c458a7356d732257ddcd14aef4f6b62b9bf39a98836c

See more details on using hashes here.

File details

Details for the file io_bench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: io_bench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for io_bench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1ccf1c3e7e8d13619846aeb90ad8abc3c69cf196039cef1c10747661e75c647
MD5 1dd454c83640345a1a08995c87d2555f
BLAKE2b-256 5e3adf0a5f7ad190f0cdb3e75cb445bb863211b0373c29d7870ca2cd22eed26a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page