Skip to main content

One-stop time series analysis tool, supporting time series data preprocessing, feature engineering, model training, model evaluation, model prediction, etc. Based on spinesTS and darts.

Project description

PipelineTS

PyPI PyPI - License PyPI - Python Version Downloads Downloads Downloads

[中文文档]

One-stop time series analysis tool, supporting time series data preprocessing, feature engineering, model training, model evaluation, model prediction, etc. Based on spinesTS and darts.

Installation

# if you don't want to use the prophet model
# run this
python -m pip install PipelineTS[core]

# if you want to use all models
# run this
python -m pip install PipelineTS[all]

Quick Start [notebook]

list all available models

from PipelineTS.dataset import LoadWebSales

init_data = LoadWebSales()[['date', 'type_a']]

valid_data = init_data.iloc[-30:, :]
data = init_data.iloc[:-30, :]
accelerator = 'auto'  # Specify Computing Device

from PipelineTS.pipeline import ModelPipeline

# list all models
ModelPipeline.list_all_available_models()
[output]:
['prophet',
 'auto_arima',
 'catboost',
 'lightgbm',
 'xgboost',
 'wide_gbrt',
 'd_linear',
 'n_linear',
 'n_beats',
 'n_hits',
 'tcn',
 'tft',
 'gau',
 'stacking_rnn',
 'time2vec',
 'multi_output_model',
 'multi_step_model',
 'transformer',
 'random_forest',
 'tide']

Training

from sklearn.metrics import mean_absolute_error

pipeline = ModelPipeline(
    time_col='date',
    target_col='type_a',
    lags=30,
    random_state=42,
    metric=mean_absolute_error,
    metric_less_is_better=True,
    accelerator=accelerator,  # Supported values for accelerator: `auto`, `cpu`, `tpu`, `cuda`, `mps`.
)

# training all models
pipeline.fit(data, valid_data=valid_data)

# use best model to predict next 30 steps data point
res = pipeline.predict(30)

Training and prediction of a single model

Without predict specify series [notebook]

Code
from PipelineTS.dataset import LoadMessagesSentDataSets
import pandas as pd

# ------------------- Data Preprocessing -------------------
# convert time col, the date column is assumed to be date_col
time_col = 'date'
target_col = 'ta'
lags = 60  # Ahead of the window size, the data will be split into multiple sequences of lags for training
n = 40 # How many steps to predict, in this case how many days to predict

# you can also load data with pandas
# init_data = pd.read_csv('/path/to/your/data.csv')
init_data = LoadMessagesSentDataSets()[[time_col, target_col]]

init_data[time_col] = pd.to_datetime(init_data[time_col], format='%Y-%m-%d')

# split trainning set and test set
valid_data = init_data.iloc[-n:, :]
data = init_data.iloc[:-n, :]
print("data shape: ", data.shape, ", valid data shape: ", valid_data.shape)
data.tail(5)

# data visualization
from PipelineTS.plot import plot_data_period
plot_data_period(
    data.iloc[-300:, :], 
    valid_data, 
    time_col=time_col, 
    target_col=target_col, 
    labels=['Train data', 'Valid_data']
)

# training and predict
from PipelineTS.nn_model import TiDEModel
tide = TiDEModel(
    time_col=time_col, target_col=target_col, lags=lags, random_state=42, 
    quantile=0.9, enable_progress_bar=False, enable_model_summary=False
)
tide.fit(data)
tide.predict(n)

With predict specify series [notebook]

Code
from PipelineTS.dataset import LoadMessagesSentDataSets
import pandas as pd

# ------------------- Data Preprocessing -------------------
# convert time col, the date column is assumed to be date_col
time_col = 'date'
target_col = 'ta'
lags = 60  # Ahead of the window size, the data will be split into multiple sequences of lags for training
n = 40 # How many steps to predict, in this case how many days to predict

# you can also load data with pandas
# init_data = pd.read_csv('/path/to/your/data.csv')
init_data = LoadMessagesSentDataSets()[[time_col, target_col]]

init_data[time_col] = pd.to_datetime(init_data[time_col], format='%Y-%m-%d')

# split trainning set and test set
valid_data = init_data.iloc[-n:, :]
data = init_data.iloc[:-n, :]
print("data shape: ", data.shape, ", valid data shape: ", valid_data.shape)
data.tail(5)

# data visualization
from PipelineTS.plot import plot_data_period
plot_data_period(
    data.iloc[-300:, :], 
    valid_data, 
    time_col=time_col, 
    target_col=target_col, 
    labels=['Train data', 'Valid_data']
)

# training and predict
from PipelineTS.nn_model import TiDEModel
tide = TiDEModel(
    time_col=time_col, target_col=target_col, lags=lags, random_state=42, 
    quantile=0.9, enable_progress_bar=False, enable_model_summary=False
)
tide.fit(data)
tide.predict(n, data=valid_data)

ModelPipeline Module

# If you need to configure the model
from xgboost import XGBRegressor
from catboost import CatBoostRegressor
from PipelineTS.pipeline import ModelPipeline, PipelineConfigs

# If you want to try multiple configurations of a model at once for comparison or tuning purposes, you can use `PipelineConfigs`.
# This feature allows for customizing the models returned by each `ModelPipeline.list_all_available_models()` call.
# The first one is the name of the model, which needs to be in the list of available models provided by ModelPipeline.list_all_available_models(). 
# If you want to customize the name of the model, then the second argument can be a string of the model name, 
# otherwise, the second one is of type dict. The dict can have three keys: 'init_configs', 'fit_configs', 'predict_configs', or any combination of them. 
# The remaining keys will be automatically filled with default parameters.
# Among them, 'init_configs' represents the initialization parameters of the model, 'fit_configs' represents the parameters during model training, 
# and 'predict_configs' represents the parameters during model prediction.

pipeline_configs = PipelineConfigs([
    ('lightgbm', 'lightgbm_linear_tree', {'init_configs': {'verbose': -1, 'linear_tree': True}}),
    ('multi_output_model', {'init_configs': {'verbose': -1}}),
    ('multi_step_model', {'init_configs': {'verbose': -1}}),
    ('multi_output_model', {
        'init_configs': {'estimator': XGBRegressor, 'random_state': 42, 'kwargs': {'verbosity': 0}}
    }
     ),
    ('multi_output_model', {
        'init_configs': {'estimator': CatBoostRegressor, 'random_state': 42, 'verbose': False}
    }
     ),
])
model_name model_name_after_rename model_configs
0lightgbm lightgbm_linear_tree {'init_configs': {'verbose': -1, 'linear_tree': True}, 'fit_configs': {}, 'predict_configs': {}}
1multi_output_modelmulti_output_model_1 {'init_configs': {'verbose': -1}, 'fit_configs': {}, 'predict_configs': {}}
2multi_output_modelmulti_output_model_2 {'init_configs': {'estimator': <class 'xgboost.sklearn.XGBRegressor'>, 'random_state': 42, 'kwargs': {'verbosity': 0}}, 'fit_configs': {}, 'predict_configs': {}}
3multi_output_modelmulti_output_model_3 {'init_configs': {'estimator': <class 'catboost.core.CatBoostRegressor'>, 'random_state': 42, 'verbose': False}, 'fit_configs': {}, 'predict_configs': {}}
4multi_step_model multi_step_model_1 {'init_configs': {'verbose': -1}, 'fit_configs': {}, 'predict_configs': {}}

Non-Interval Forecasting [notebook]

from sklearn.metrics import mean_absolute_error

from PipelineTS.pipeline import ModelPipeline

pipeline = ModelPipeline(
    time_col=time_col, 
    target_col=target_col, 
    lags=lags, 
    random_state=42, 
    metric=mean_absolute_error, 
    metric_less_is_better=True,
    configs=pipeline_configs,
    include_init_config_model=False,
    scaler=False,  # False for MinMaxScaler, True for StandardScaler, None means no data be scaled
    # include_models=['d_linear', 'random_forest', 'n_linear', 'n_beats'],  # specifying the model used
    # exclude_models=['catboost', 'tcn', 'transformer'],  # exclude specified models
    # Note that `include_models` and `exclude_models` cannot be specified simultaneously.
    accelerator=accelerator,
    # Now we can directly input the "modelname__'init_params'" parameter to instantiate the models in ModelPipeline.
    # Note that it is double underline. 
    # When it is duplicated with the ModelPipeline class keyword parameter, the ModelPipeline clas keyword parameter is ignored
    d_linear__lags=50,
    n_linear__random_state=1024,
    n_beats__num_blocks=3,
    random_forest__n_estimators=200,
    n_hits__accelerator='cpu', # Since using mps backend for n_hits model on mac gives an error, cpu backend is used as an alternative
    tft__accelerator='cpu', # tft, same question, but if you use cuda backend, you can just ignore this two configurations.
)

pipeline.fit(data, valid_data)

Get the model parameters in ModelPipeline

# Gets all configurations for the specified model, default to best model
pipeline.get_model_all_configs(model_name='wide_gbrt')

Plotting the forecast results

# use best model to predict next 30 steps data point
prediction = pipeline.predict(n, model_name=None)  # You can use `model_name` to specify the pre-trained model in the pipeline when using Python.

plot_data_period(init_data.iloc[-100:, :], prediction, 
                 time_col=time_col, target_col=target_col)

image1

Interval prediction [notebook]

from sklearn.metrics import mean_absolute_error

from PipelineTS.pipeline import ModelPipeline

pipeline = ModelPipeline(
    time_col=time_col,
    target_col=target_col,
    lags=lags,
    random_state=42,
    metric=mean_absolute_error,
    metric_less_is_better=True,
    configs=pipeline_configs,
    include_init_config_model=False,
    scaler=False,
    with_quantile_prediction=True,  # turn on the quantile prediction switch, if you like
    accelerator=accelerator,
    # models=['wide_gbrt']  # Specify the model
    n_hits__accelerator='cpu',
    tft__accelerator='cpu',
)

pipeline.fit(data, valid_data)

Plotting the forecast results

# use best model to predict next 30 steps data point
prediction = pipeline.predict(n, model_name=None)  # You can use `model_name` to specify the pre-trained model in the pipeline when using Python.

plot_data_period(init_data.iloc[-100:, :], prediction, 
                 time_col=time_col, target_col=target_col)

image1

Model and ModelPipeline saving and loading

from PipelineTS.io import load_model, save_model

# save
save_model(path='/path/to/save/your/fitted_model_or_pipeline.zip', model=pipeline)
# load
pipeline = load_model('/path/to/save/your/fitted_model_or_pipeline.zip')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelinets-0.3.12.tar.gz (83.1 kB view details)

Uploaded Source

Built Distribution

PipelineTS-0.3.12-py3-none-any.whl (119.0 kB view details)

Uploaded Python 3

File details

Details for the file pipelinets-0.3.12.tar.gz.

File metadata

  • Download URL: pipelinets-0.3.12.tar.gz
  • Upload date:
  • Size: 83.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for pipelinets-0.3.12.tar.gz
Algorithm Hash digest
SHA256 fd5e189e2580df0d6d8adca66f1065641fb2c72a5e050faebbfea5e0d45b6f61
MD5 7dec783a35bde8794e2b0a93acfb2ec7
BLAKE2b-256 ca31dd3265ea9d4d1c3835ae0e05da7b1b48e1efe88326264d73b5784d7281ed

See more details on using hashes here.

File details

Details for the file PipelineTS-0.3.12-py3-none-any.whl.

File metadata

  • Download URL: PipelineTS-0.3.12-py3-none-any.whl
  • Upload date:
  • Size: 119.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for PipelineTS-0.3.12-py3-none-any.whl
Algorithm Hash digest
SHA256 0c5fc790aaa233efa89fb42cf09ca522aac3e1f135c666b0fe2e162cebd58238
MD5 5ec00c0941a83724cd552c312d33f0f8
BLAKE2b-256 73933833e6cc53a80d10074ccc475bf43f7a91b51c7072e85e5a0eda288b46f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page