Skip to main content

Library/framework for making predictions.

Project description

predictit

PyPI pyversions PyPI version Language grade: Python Build Status Documentation Status License: MIT codecov

Library/framework for making predictions. Choose best of 20 models (ARIMA, regressions, LSTM...) from libraries like statsmodels, scikit-learn, tensorflow and some own models. There are hundreds of customizable options (it's not necessary of course) as well as some Config presets.

Library contain model hyperparameters optimization as well as option variable optimization. That means, that library can find optimal preprocessing (smoothing, dropping non correlated columns, standardization) and on top of that it can find optimal models inner parameters such as number of neuron layers.

Output

Most common output is plotly interactive graph, numpy array of results or deploying to database.

Plot of results

Table of results

Return type of main predict function depends on configation.py. It can return best prediction as array or all predictions as dataframe. Interactive html plot is also created.

Oficial repo and documentation links

Repo on github

Official readthedocs documentation

Installation

Python >=3.6. Python 2 is not supported. Install just with

pip install predictit

Sometime you can have issues with installing some libraries from requirements (e.g. numpy because not BLAS / LAPACK). There are also two libraries - Tensorflow and pyodbc not in requirements, because not necessary, but troublesome. If library not installed with pip, check which library don't work, install manually with stackoverflow and repeat...

How to

Software can be used in three ways. As a python library or with command line arguments or as normal python scripts. Main function is predict in main.py script. There is also predict_multiple_columns function if you want to predict more at once (columns or time frequentions) and also compare_models function that tell you which models are best. It evaluate error criterion on out of sample test data instead of predict (use as much as possible) so more reliable errors (for example decision trees just assign input from learning set, so error in predict is 0, in compare_models it's accurate). Then you can use only good models in predict function.

You can setup prediction with Config. It' capitalize because it's class.

Simple example of using predictit as a python library and function arguments

import predictit
import numpy as np

predictions_1 = predictit.main.predict(data=np.random.randn(100, 2), predicted_column=1, predicts=3, return_type='best')

# There are only two positional arguments (because, there is more than hundred configurable values).
# data and predicted_column, so you can use also

mydata = pd.DataFrame(np.random.randn(100, 2), columns=['a', 'b'])
predictions_1_positional = predictit.main.predict(mydata, 'b')

Simple example of using as a python library and editing Config

import predictit
from predictit.configuration import Config

# You can edit Config in two ways
Config.data = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv'  # You can use local path on pc as well... "/home/name/mycsv.csv" !
Config.predicted_column = 'Temp'  # You can use index as well
Config.datetime_column = 'Date'  # Will be used for resampling and result plot description
Config.freq = "D"  # One day - one value
Config.resample_function = "mean"  # If more values in one day - use mean (more sources)
Config.return_type = 'detailed_dictionary'
Config.debug = 0  # Ignore warnings

# Or
Config.update({
    'predicts': 14,  # Number of predicted values
    'default_n_steps_in': 12  # Value of recursive inputs in model (do not use too high - slower and worse predictions)
})

predictions_2 = predictit.main.predict()

Simple example of using main.py as a script

Open configuration.py (only script you need to edit (very simple)), do the setup. Mainly used_function and data or data_source and path. Then just run main.py.

Simple example of using command line arguments

Run code below in terminal in predictit folder. Use python main.py --help for more parameters info.

python main.py --used_function predict --data 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv' --predicted_column "'Temp'"

Explore Config

Type Config., then, if not autamatically, use ctrl + spacebar to see all posible values. To see what option means, type for example Config.return_type, than do mouseover with pressed ctrl. It will reveal comment that describe the option (at least at VS Code)

To see all the possible values in configuration.py, use

predictit.configuration.print_config()

Example of compare_models function

import predictit
from predictit.configuration import Config

my_data_array = np.random.randn(2000, 4)  # Define your data here

# You can compare it on same data in various parts or on different data (check configuration on how to insert dictionary with data names)
Config.update({
    'data_all': (my_data_array[-2000:], my_data_array[-1500:], my_data_array[-1000:])
})

compared_models = predictit.main.compare_models()

Example of predict_multiple function

import predictit
from predictit.configuration import Config

Config.data = pd.read_csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv")

# Define list of columns or '*' for predicting all of the columns
Config.predicted_columns = ['*']

multiple_columns_prediction = predictit.main.predict_multiple_columns()

Example of Config variable optimization

Config.update({
    'data': "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv",
    'predicted_column': 'Temp',
    'return_type': 'all_dataframe',
    'optimization': 1,
    'optimization_variable': 'default_n_steps_in',
    'optimization_values': [4, 8, 10],
    'plot_all_optimized_models': 1,
    'print_table': 2,  # Print detailed table
    'used_models': ['AR (Autoregression)', 'Conjugate gradient', 'Sklearn regression']
})

predictions_optimized_config = predictit.main.predict()

Hyperparameters tuning

To optmize hyperparameters, just set optimizeit: 1, and model parameters limits. It is commented in Config.py how to use it. It's not grid bruteforce. Heuristic method based on halving interval is used, but still it can be time consuming. It is recomend only to tune parameters worth of it. Or tune it by parts.

GUI

It is possible to use basic GUI. But only with CSV data source. Just run gui_start.py if you have downloaded software or call predictit.gui_start.run_gui() if you are importing via PyPI.

Screenshot of such a GUI

Table of results

Better GUI with fully customizable settings will be shipped next year.

Feature derivation

It is possible to add new data that is derived from original. It can be running fourier transform maximum or two columns multiplication or rolling standard deviation.

Categorical embedings

It is also possible to use string values in predictions. You can choose Config values 'embedding' 'label' and every unique string will be assigned unique number, 'one-hot' create new column for every unique string (can be time consuming).

Feature extraction

Under development right now :[

Data preprocessing, plotting and other Functions

You can use any library functions separately for your needs of course. mydatapreprocessing, mylogging and myplottling are my other projects, which are used heavily. Example is here

from mydatapreprocessing import load_data, data_consolidation, preprocess_data
from myplotting import plot
from predictit.analyze import analyze_column

data = "https://blockchain.info/unconfirmed-transactions?format=json"

# Load data from file or URL
data_loaded = load_data(data, request_datatype_suffix=".json", predicted_table='txs', data_orientation="index")

# Transform various data into defined format - pandas dataframe - convert to numeric if possible, keep
# only numeric data and resample ifg configured. It return array, dataframe
data_consolidated = data_consolidation(
    data_loaded, predicted_column="weight", remove_nans_threshold=0.9, remove_nans_or_replace='interpolate')

# Predicted column is on index 0 after consolidation)
analyze_column(data_consolidated.iloc[:, 0])

# Preprocess data. It return preprocessed data, but also last undifferenced value and scaler for inverse
# transformation, so unpack it with _
data_preprocessed, _, _ = preprocess_data(data_consolidated, remove_outliers=True, smoothit=False,
                                        correlation_threshold=False, data_transform=False, standardizeit='standardize')

# Plot inserted data
plot(data_preprocessed)

Example of using library as a pro with deeper editting Config

import predictit
from predictit.configuration import Config

Config.update({
    'data': r'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv',  # Full CSV path with suffix
    'predicted_column': 'Temp',  # Column name that we want to predict

    'predicts': 7,  # Number of predicted values - 7 by default
    'print_number_of_models': 6,  # Visualize 6 best models
    'repeatit': 50,  # Repeat calculation times on shifted data to evaluate error criterion
    'other_columns': 0,  # Whether use other columns or not
    'debug': 1,  # Whether print details and warnings

    # Chose models that will be computed - remove if you want to use all the models
    'used_models': [
        "AR (Autoregression)",
        "ARIMA (Autoregression integrated moving average)",
        "Autoregressive Linear neural unit",
        "Conjugate gradient",
        "Sklearn regression",
        "Bayes ridge regression one step",
        "Decision tree regression",
    ],

    # Define parameters of models

    'models_parameters': {

        "AR (Autoregression)": {'used_model': 'ar', 'method': 'cmle', 'ic': 'aic', 'trend': 'nc', 'solver': 'lbfgs'},
        "ARIMA (Autoregression integrated moving average)": {'used_model': 'arima', 'p': 6, 'd': 0, 'q': 0},

        "Autoregressive Linear neural unit": {'mi_multiple': 1, 'mi_linspace': (1e-5, 1e-4, 3), 'epochs': 10, 'w_predict': 0, 'minormit': 0},
        "Conjugate gradient": {'epochs': 80},

        "Bayes ridge regression": {'regressor': 'bayesianridge', 'n_iter': 300, 'alpha_1': 1.e-6, 'alpha_2': 1.e-6, 'lambda_1': 1.e-6, 'lambda_2': 1.e-6},
    }
})

predictions_configured = predictit.main.predict()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

predictit-1.60.9.tar.gz (48.4 kB view details)

Uploaded Source

Built Distributions

predictit-1.60.9-py3.7.egg (106.5 kB view details)

Uploaded Source

predictit-1.60.9-py3-none-any.whl (55.1 kB view details)

Uploaded Python 3

File details

Details for the file predictit-1.60.9.tar.gz.

File metadata

  • Download URL: predictit-1.60.9.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.1

File hashes

Hashes for predictit-1.60.9.tar.gz
Algorithm Hash digest
SHA256 eda1f3c73f29327194d2ef69a7bef738d0af7074f11c3928f7133e2454ab5b80
MD5 b25da064dcac830116230562ffd2c656
BLAKE2b-256 f487a16112aadc7a5a2b7ec144034a62ba861ec611fd3cad3033129561c808be

See more details on using hashes here.

File details

Details for the file predictit-1.60.9-py3.7.egg.

File metadata

  • Download URL: predictit-1.60.9-py3.7.egg
  • Upload date:
  • Size: 106.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.1

File hashes

Hashes for predictit-1.60.9-py3.7.egg
Algorithm Hash digest
SHA256 2b5c2f8679ed44336a98c31816557e240df59194aacb1ead0bc76c4ddece4046
MD5 b2748d12ed55b4e80a8ee9584246906b
BLAKE2b-256 c8c384eed8d0ef5670c5977dbe7f2e4edd1f97a85d5a310bcb31ae1609bf3566

See more details on using hashes here.

File details

Details for the file predictit-1.60.9-py3-none-any.whl.

File metadata

  • Download URL: predictit-1.60.9-py3-none-any.whl
  • Upload date:
  • Size: 55.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.1

File hashes

Hashes for predictit-1.60.9-py3-none-any.whl
Algorithm Hash digest
SHA256 311e94a019fd97c40b63a062ad2d630c7eb860ab551f22f2c42506ef579a5ddf
MD5 491debb8d2d9b88e0e165de4b60d33da
BLAKE2b-256 3551eb072a6d13edcfe98709683ea494c57ef2a69371311a2094df6cd2458305

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page