A query predictor pipeline and service to predict resource usages of Presto queries
Project description
presto-query-predictor
presto-query-predictor is a Python module introducing machine learning techniques to the Presto ecosystem. It contains a machine learning pipeline for the model training/evaluation and a query predictor web service to predict CPU and memory usages of Presto queries.
Installation
After cloning the GitHub repository,
pip3 install -e . # Installs the presto-query-predictor package locally
pip3 install -r requirements.txt # Installs dependencies
An alternative way is to install the package from PyPi,
pip3 install presto-query-predictor
We recommend installing the package in a Python virtual environment instead of installing it globally.
Examples
The query_predictor/
folder contains the core of the package. We have prepared
some examples in the example/
folder, including
load_data.py
- An example to load the embedded fake TPCH-based dataset.transform.py
- An example to transform datasets for further training.train.py
- An example to train CPU and memory models.tune.py
- An example to tune classification algorithms.app.py
- An example to create a query predictor web service.
Training
A simple way to get a sense of the CPU and memory model training is running the
examples in the example/
folder.
cd examples
python3 transform.py
python3 train.py
The presto-query-predictor package can only be executed in a Python 3 environment. It does not support Python 2.
Afterward, the trained models should be generated in the models
folder, including
models/
vec-cpu.bin
vec-memory.bin
model-cpu.bin
model-memory.bin
By default, the vectorizers are trained from the TF-IDF algorithm, and the models are trained from XGBoost classifiers. The dataset used for training is a faked dataset based on the TPC-H benchmark with only 22 samples.
Serving
After running
python3 app.py
A Flask web application should be created at http://0.0.0.0:8000/. There is a web UI for the application where you can fill in the form with a query for resources prediction.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file presto-query-predictor-0.1.4.tar.gz
.
File metadata
- Download URL: presto-query-predictor-0.1.4.tar.gz
- Upload date:
- Size: 373.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77ce0a335736346cbffded502e810f064b9a6ca4cbda2d60fa17dc2d40e8b376 |
|
MD5 | f618fff3d778f8aa812dfffde18d7215 |
|
BLAKE2b-256 | 0baa77408eeb9db67d9a7980348612a8bce897fbaea78118b8c96d9bc0b3c038 |
File details
Details for the file presto_query_predictor-0.1.4-py3.7.egg
.
File metadata
- Download URL: presto_query_predictor-0.1.4-py3.7.egg
- Upload date:
- Size: 99.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebef32b2530cf5a2d95bc8e22bddb471d8450f5f3020396900f2c4b2515e6948 |
|
MD5 | 98b0b4693f2477b09d04474f70ce50cd |
|
BLAKE2b-256 | 112bc8877392486dfdff39027ab57628c4d53ad40526c00ae7f0d82c18cf0591 |