A query predictor pipeline and service to predict resource usages of Presto queries
presto-query-predictor is a Python module introducing machine learning techniques to the Presto ecosystem. It contains a machine learning pipeline for the model training/evaluation and a query predictor web service to predict CPU and memory usages of Presto queries.
After cloning the GitHub repository,
pip3 install -e . # Installs the presto-query-predictor package locally pip3 install -r requirements.txt # Installs dependencies
An alternative way is to install the package from PyPi,
pip3 install presto-query-predictor
We recommend installing the package in a Python virtual environment instead of installing it globally.
query_predictor/ folder contains the core of the package. We have prepared
some examples in the
example/ folder, including
load_data.py- An example to load the embedded fake TPCH-based dataset.
transform.py- An example to transform datasets for further training.
train.py- An example to train CPU and memory models.
tune.py- An example to tune classification algorithms.
app.py- An example to create a query predictor web service.
A simple way to get a sense of the CPU and memory model training is running the
examples in the
cd examples python3 transform.py python3 train.py
The presto-query-predictor package can only be executed in a Python 3 environment. It does not support Python 2.
Afterward, the trained models should be generated in the
models folder, including
models/ vec-cpu.bin vec-memory.bin model-cpu.bin model-memory.bin
By default, the vectorizers are trained from the TF-IDF algorithm, and the models are trained from XGBoost classifiers. The dataset used for training is a faked dataset based on the TPC-H benchmark with only 22 samples.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size presto_query_predictor-0.1.4-py3.7.egg (99.7 kB)||File type Egg||Python version 3.7||Upload date||Hashes View|
|Filename, size presto-query-predictor-0.1.4.tar.gz (373.4 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for presto_query_predictor-0.1.4-py3.7.egg
Hashes for presto-query-predictor-0.1.4.tar.gz