A query predictor pipeline and service to predict resource usages of Presto queries
Project description
presto-query-predictor
presto-query-predictor is a Python module introducing machine learning techniques to the Presto ecosystem. It contains a machine learning pipeline for the model training/evaluation and a query predictor web service to predict CPU and memory usages of Presto queries.
Installation
After cloning the GitHub repository,
pip3 install -e . # Installs the presto-query-predictor package locally
pip3 install -r requirements.txt # Installs dependencies
An alternative way is to install the package from PyPi,
pip3 install presto-query-predictor
We recommend installing the package in a Python virtual environment instead of installing it globally.
Examples
The query_predictor/
folder contains the core of the package. We have prepared
some examples in the example/
folder, including
load_data.py
- An example to load the embedded fake TPCH-based dataset.transform.py
- An example to transform datasets for further training.train.py
- An example to train CPU and memory models.tune.py
- An example to tune classification algorithms.app.py
- An example to create a query predictor web service.
Training
A simple way to get a sense of the CPU and memory model training is running the
examples in the example/
folder.
cd examples
python3 transform.py
python3 train.py
The presto-query-predictor package can only be executed in a Python 3 environment. It does not support Python 2.
Afterward, the trained models should be generated in the models
folder, including
models/
vec-cpu.bin
vec-memory.bin
model-cpu.bin
model-memory.bin
By default, the vectorizers are trained from the TF-IDF algorithm, and the models are trained from XGBoost classifiers. The dataset used for training is a faked dataset based on the TPC-H benchmark with only 22 samples.
Serving
After running
python3 app.py
A Flask web application should be created at http://0.0.0.0:8000/. There is a web UI for the application where you can fill in the form with a query for resources prediction.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for presto-query-predictor-0.1.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77ce0a335736346cbffded502e810f064b9a6ca4cbda2d60fa17dc2d40e8b376 |
|
MD5 | f618fff3d778f8aa812dfffde18d7215 |
|
BLAKE2b-256 | 0baa77408eeb9db67d9a7980348612a8bce897fbaea78118b8c96d9bc0b3c038 |
Hashes for presto_query_predictor-0.1.4-py3.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebef32b2530cf5a2d95bc8e22bddb471d8450f5f3020396900f2c4b2515e6948 |
|
MD5 | 98b0b4693f2477b09d04474f70ce50cd |
|
BLAKE2b-256 | 112bc8877392486dfdff39027ab57628c4d53ad40526c00ae7f0d82c18cf0591 |