Explores time information to train a robust random forest
Project description
time-robust-forest
A Proof of concept model that explores timestamp information to train a random forest with better Out of Distribution generalization power.
Installation
pip install -U time-robust-forest
How to use it
There are a classifier and a regressor under time_robust_forest.models
. They follow the sklearn interface, which means you can quickly fit and use a model:
from time_robust_forest.models import TimeForestClassifier
features = ["x_1", "x_2"]
time_column = "periods"
target = "y"
model = TimeForestClassifier(time_column=time_column)
model.fit(training_data[features + [time_column]], training_data[target])
predictions = model.predict_proba(test_data[features])[:, 1]
There are only a few arguments that differ from a traditional Random Forest. two arguments
- time_column: the column from the input dataframe containing the time periods the model will iterate over to find the best splits (default: "period")
- min_sample_periods: the number of examples in every period the model needs to keep while it splits.
- period_criterion: how the performance in every period is going to be aggregated. Options: {"avg": average, "max": maximum, the worst case}. (default: "avg")
To use the environment-wise optimization:
from time_robust_forest.hyper_opt import env_wise_hyper_opt
params_grid = {"n_estimators": [30, 60, 120],
"max_depth": [5, 10],
"min_impurity_decrease": [1e-1, 1e-3, 0],
"min_sample_periods": [5, 10, 30],
"period_criterion": ["max", "avg"]}
model = TimeForestClassifier(time_column=time_column)
opt_param = env_wise_hyper_opt(training_data[features + [time_column]],
training_data[TARGET],
model,
time_column,
params_grid,
cv=5,
scorer=make_scorer(roc_auc_score,
needs_proba=True))
Make sure you have a good choice for the time column
Don't simply use a timestamp column from the dataset, make it discrete before and guarantee there is a reasonable amount of data points in every period. Example: use year if you have 3+ years of data. Notice the choice to make it discrete becomes a modeling choice you can optimize.
License
This project is licensed under the terms of the BSD-3
license. See LICENSE for more details.
Useful links
Citation
@misc{time-robust-forest,
author = {Moneda, Luis},
title = {Time Robust Forest model},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/lgmoneda/time-robust-forest}}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for time-robust-forest-0.1.12.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88bbfc75301db5955dc396ebffbc6f54376c8b7213198983e9a23b155a917d74 |
|
MD5 | 43ff5918b935501ada62f67398c89c4f |
|
BLAKE2b-256 | e57628081916156bb3928ad462fef1492e7e107ed86d0d0cf1efc99b136ff3f0 |
Hashes for time_robust_forest-0.1.12-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7042baef029c2f33bc176b6e113e8f326e4211a968633421e93ba63d2f8189a |
|
MD5 | ff0a58d3faa570265f24e13d48fc0787 |
|
BLAKE2b-256 | 22643374515f1548c6b29f02beb3d4defc392d0faed65cfb54985bfcd2f018b1 |