Skip to main content

A query predictor pipeline and service to predict resource usages of Presto queries

Project description

presto-query-predictor

presto-query-predictor is a Python module introducing machine learning techniques to the Presto ecosystem. It contains a machine learning pipeline for the model training/evaluation and a query predictor web service to predict CPU and memory usages of Presto queries.

Installation

After cloning the GitHub repository,

pip3 install -e .  # Installs the presto-query-predictor package locally
pip3 install -r requirements.txt  # Installs dependencies

An alternative way is to install the package from PyPi,

pip3 install presto-query-predictor

We recommend installing the package in a Python virtual environment instead of installing it globally.

Examples

The query_predictor/ folder contains the core of the package. We have prepared some examples in the example/ folder, including

  • load_data.py - An example to load the embedded fake TPCH-based dataset.
  • transform.py - An example to transform datasets for further training.
  • train.py - An example to train CPU and memory models.
  • tune.py - An example to tune classification algorithms.
  • app.py - An example to create a query predictor web service.

Training

A simple way to get a sense of the CPU and memory model training is running the examples in the example/ folder.

cd examples
python3 transform.py
python3 train.py

The presto-query-predictor package can only be executed in a Python 3 environment. It does not support Python 2.

Afterward, the trained models should be generated in the models folder, including

models/
    vec-cpu.bin
    vec-memory.bin
    model-cpu.bin
    model-memory.bin

By default, the vectorizers are trained from the TF-IDF algorithm, and the models are trained from XGBoost classifiers. The dataset used for training is a faked dataset based on the TPC-H benchmark with only 22 samples.

Serving

After running

python3 app.py

A Flask web application should be created at http://0.0.0.0:8000/. There is a web UI for the application where you can fill in the form with a query for resources prediction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

presto-query-predictor-0.1.4.tar.gz (373.4 kB view details)

Uploaded Source

Built Distribution

presto_query_predictor-0.1.4-py3.7.egg (99.7 kB view details)

Uploaded Source

File details

Details for the file presto-query-predictor-0.1.4.tar.gz.

File metadata

  • Download URL: presto-query-predictor-0.1.4.tar.gz
  • Upload date:
  • Size: 373.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4

File hashes

Hashes for presto-query-predictor-0.1.4.tar.gz
Algorithm Hash digest
SHA256 77ce0a335736346cbffded502e810f064b9a6ca4cbda2d60fa17dc2d40e8b376
MD5 f618fff3d778f8aa812dfffde18d7215
BLAKE2b-256 0baa77408eeb9db67d9a7980348612a8bce897fbaea78118b8c96d9bc0b3c038

See more details on using hashes here.

File details

Details for the file presto_query_predictor-0.1.4-py3.7.egg.

File metadata

  • Download URL: presto_query_predictor-0.1.4-py3.7.egg
  • Upload date:
  • Size: 99.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4

File hashes

Hashes for presto_query_predictor-0.1.4-py3.7.egg
Algorithm Hash digest
SHA256 ebef32b2530cf5a2d95bc8e22bddb471d8450f5f3020396900f2c4b2515e6948
MD5 98b0b4693f2477b09d04474f70ce50cd
BLAKE2b-256 112bc8877392486dfdff39027ab57628c4d53ad40526c00ae7f0d82c18cf0591

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page