Skip to main content

A general purpose stepshifting algorithm for tabular data, based on BaseEstimator.

Project description

StepShifter3 🛠️

A general purpose Python package for time series analysis of tabular data

Official Website VIEWS Forecasting Website GitHub Repo stars Twitter Follow LinkedIn Unit Tests

StepShifter3 is a Python package designed to facilitate time series analysis of tabular data. It is developed and maintained by the Peace Research Institute Oslo (PRIO) as part of the VIEWS project.

📚 Table of Contents

  1. 🛠 Installation
  2. 📝 Usage
  3. 🤝 Contributing
  4. 🐞 Common bugs
  5. 🔖 License
  6. ❓ FAQs
  7. 🙏 Credits
  8. 📚 References

🛠 Installation

To install StepShifter3, you have two options:

🚨 Recommended Branch: stable

For a more stable experience, we recommend using the stable branch rather than the main branch. The stable branch contains well-tested and production-ready code, while the main branch may contain work-in-progress or experimental features that could be unstable.

How to Switch to the stable branch:

Using Git CLI:

  • For pip installation, clone the stable branch directly:
    git clone -b stable https://github.com/YourUsername/StepShifter3.git
    
  • If you've already cloned the repository and are on the main branch, switch to stable with:
    git checkout stable
    

Using GitHub Web Interface:

  • If you're downloading the code from the GitHub web interface, make sure to switch to the stable branch using the branch dropdown before downloading.
  1. Using pip: 📦

    pip install StepShifter3
    
  2. From GitHub: 🐱‍💻

    git clone https://github.com/YourUsername/StepShifter3.git
    cd StepShifter3
    python setup.py install
    

📝 Usage

The Stepshifter class is the main class of the package. It handles all models which is herited from the sklearn BaseEstimator class.

Basic Usage with XGBRegressor and dummy data from synthetic data generator

from StepShifter3 import StepShifter, SyntheticDataGenerator
from xgboost import XGBRegressor 

# Generates a pandas multiindex dataframe with dummy data Indexes: month_id, country_id
df_synthetic_small  = SyntheticDataGenerator("loa", n_time=516, n_prio_grid_size=50, n_country_size= 242,n_features=15,use_dask=True).generate_dataframe()


# Initialize the StepShifter class with the XGBRegressor model, DaskClientManager and parameters



params_xgb_reg = {
        'objective': 'reg:squarederror',
        'n_estimators': 80,
        'max_depth': 3,
        'learning_rate': 0.1,
        'gamma': 0,
        'min_child_weight': 1,
        'subsample': 1,
        'eval_metric': 'rmse',
    }

# Establish a connection to daskclientmanger
dask_client = DaskClientManager(is_local=True, n_workers=8, threads_per_worker=1, memory_limit="4.5GB", remote_addresses=None,asynchronous=False)

stepshifter_config_regression = { "target_column" : "ln_ged_sb_dep",    # The target column in your training dataset
                       "ID_columns" : ["month_id", "priogrid_id"],      # The ID columns in your training dataset
                       "time_column" : "month_id",                      # The time column in your training dataset
                       "run_name" : 'my_first_run',                     # The name of the run in mlflow, should be changes every time a new model type is run
                       "experiment_name" : 'ensemble_models',           # The name of the experiment in mlflow
                       "mlflow_tracking_uri" : 'http://127.0.0.1:5000', # The uri of the mlflow server, if not set the default is localhost:5000 or 127.0.0.1:5000                                  
                       "S": 36,                                         # Number of steps ahead to predict
                       "metrics_report": True,                          # Not used at the moment
                       "fit_params":{},                                 # Parameter list to be passed to the fit method of the model
                       "dask_client": dask_client,                      # Dask Client             
                       "is_dask": True,                                 # Set True if using dataframes from dask
                       }

# Initialize stepshifter class
stepshifter = StepShifter(xgboost.dask.DaskXGBRegressor(**params_xgb_reg), stepshifter_config_regression)

# What part of the data should be validated
validation_range = [1, 516]

X, y, is_dask = stepshifter.validate_and_filter_data(df_synthetic, validation_range)

# Fit the model

tau_e_0 = 121
tau_e_t = 316
stepshifter.fit(X,y,tau_e_0,tau_e_t)

# Get predictions
X_pred = ...
tau_start = ...
tau_end = ...
stepshifter.predict(X_pred,tau_start,tau_end)

🤝 Contributing

Contributions are welcome! To contribute:

  1. Make an issue describing the feature you want to add or the bug you want to fix.
  2. Create your Feature Branch (git checkout -b <issuenumber>-<your-feature-name>)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin <issuenumber>-<your-feature-name>)
  5. Open a Pull Request

🐞 Common bugs

Using the wrong predict() function

An easy-to-make mistake is to use the wrong predict() function. Make sure to use the StepShifter predict() function by running predict() on the StepShifter object and not on the trained models.

Correct use of the StepShifter predict(): stepshifter.predict(X, tau_start, tau_end)

Incorrect use of the StepShifter predict(): stepshifter.models[<some_number_between_1_and_S>].predict(X, tau_start, tau_end)

📚 References

  1. Hegre et.al: Partitioning and time-shifting in VIEWS, fatalities002∗†

🔖 License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stepshifter3-0.2.0b0.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stepshifter3-0.2.0b0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file stepshifter3-0.2.0b0.tar.gz.

File metadata

  • Download URL: stepshifter3-0.2.0b0.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for stepshifter3-0.2.0b0.tar.gz
Algorithm Hash digest
SHA256 c8a976760fd8a5c4438bec45c0048377234b4a491779567ed4ef3b0f47413731
MD5 ba2c3d46f0d74be1ecc6105c23dba87a
BLAKE2b-256 3a12eca48faae276fba5ef1013681877e0b50a5711edb41cefd5aebe35793887

See more details on using hashes here.

File details

Details for the file stepshifter3-0.2.0b0-py3-none-any.whl.

File metadata

  • Download URL: stepshifter3-0.2.0b0-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for stepshifter3-0.2.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 69b433d35eb418b56054c1185a2034b6a5ea1f837abde0fa62ca22783726dafd
MD5 e35f28b147564a5caa752be6edab5609
BLAKE2b-256 2f13ebd8b57647022c6c8aefde3b597627e473f2310756f38afbf39df71b38ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page