Repairing tool for time series with weekly seasonality
Project description
smana: repairing tool for time series with weekly seasonality
What is it?
smana is a Python package useful to restore missing values of a time series with a weekly pattern.
Table of Contents
Main Features
- Missing values restoring for time series with weekly seasonal pattern
- Any time series with sub-daily resolution is supported
- Handling of calendar information on public holidays (if provided by the user)
Dependencies
How it works
This package arises from the need to restore energy time series data, which usually present weekly seasonality and not rarely even a correlation with public holidays. Nevertheless, the implementation is based only on the assumption that the time series shows a weekly pattern, thus this tool can be used to repair data of whatever nature with this seasonal characteristic.
The core of the algorithm is based on STL decomposition
("Seasonal and Trend decomposition using Loess"), a robust method for decomposing time series into trend, seasonal and
remainder components, implemented in statsmodels
module.
The main method of this package, smana.repair()
, aims to restore sequences of missing data (represented as numpy.NaN
) by means of locally approximation of the trend and the seasonal components of the time series; in order to get the
seasonality estimation, the algorithm tries to identify a sequence of at least 14 consecutive days of valid data: if
it does not exist, linear interpolation or lookup table strategies are iteratively applied (using a ranking criteria
on missing-values sequences) until a 14-days sequence appears.
In addition, this tool is able to handle calendar information on public holidays: this feature is useful only if the time series presents a correlation with these specific days, in particular if its daily pattern resemble that of standard week holidays; for this reason, it is recommended to leverage this feature only if this assumption is verified.
How to get it
The source code is currently hosted on GitHub at: https://github.com/ToBe-Analytics/smana
Binary installers for the latest released version are available at the Python Package Index (PyPI).
# PyPI
pip install -i https://pypi.org/simple/ smana
The list of changes to smana
between each release can be found
here. For full
details, see the commit logs at https://github.com/ToBe-Analytics/smana.
Documentation
The package provides the following main method, which implements the whole procedure described:
smana.repair(input_df, scan_column, datetime_column=None, trend_approx_days=7, nonnegative_constraint=False, holidays_stl=False, week_holiday_int=6, holidays_column=None, inplace=False)
This function restores missing values (numpy.NaN
) of the time series scan_column
in input_df
dataframe,
with datetime_column
as timestamps column, by a process based on the STL decomposition.
Optionally, setting holidays_stl
to True, it is possible to apply a similar strategy to repair
missing data related to public holidays (this procedure is based on week holiday data).
Parameters
- input_df: pandas.DataFrame
Input dataframe which collects the time series to be repaired, the datetime series and optionally the column with public holidays information. - scan_column: str
Label of the numeric column of input_df to be restored. Missing values must be represented asnumpy.NaN
. - datetime_column: str, default None
Label of the datetime column of input_df; aware or naive datetime are supported. If unspecified, input_df.index is considered. - trend_approx_days: int, default 7
Number of days to consider for trend estimation; higher values lead to approximations over longer periods. Integers less than 7 will be replaced by default value. It is not necessary to modify this parameter. - nonnegative_constraint: bool, default False
Set to True to check and repair negative restored values. - holidays_stl: bool, default False
Apply a specific strategy for the restoring of missing values related to public holidays. - week_holiday_int: int, default 6
Index corresponding to the week holiday, from 0 (Monday) to 6 (Sunday). This argument is considered only ifholidays_stl
is set to True. - holidays_column: str, default None
Label of the column which collects holidays information; for each row ininput_df
, the allowed values are only 0 (working day) or 1 (holiday, including standard week holiday). This argument is considered only ifholidays_stl
is set to True. - inplace: bool, default False
If False, return a copy. Otherwise, do operation inplace and the method returns None.
Returns
- pandas.DataFrame or None
DataFrame restored or None ifinplace
is set to True.
License
Contributing to smana
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome. A detailed overview on how to contribute can be found in the contributing guide. As contributors and maintainers to this project, you are expected to abide by our code of conduct. More information can be found at: Contributor Code of Conduct
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file smana-0.1.2.tar.gz
.
File metadata
- Download URL: smana-0.1.2.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d57a41e530e33670fca8a12777c0bae775c1a9c9260d944086dc91a66ae9b40 |
|
MD5 | 17d946821d371bc503eb029e42fee724 |
|
BLAKE2b-256 | 803b25b369fdb274d9ff4fe212043c7e768ee7e613bcd1825f3c03253d0eaeaf |
File details
Details for the file smana-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: smana-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 604b8a7713bdc1ece9ae763cc4598490c503beebbcfb41d5d2d973dfc86e402c |
|
MD5 | 527532228ee6fd7eba30c2f76039feb3 |
|
BLAKE2b-256 | 39b43c0fece8685ff067b813ea8aded1acc55e1b802fa1570da6f271e3b908f0 |