Skip to main content

Library which processes time series datasets

Project description

timeSeries-processing

Library for processing time series datasets.

This library is part of my bachelor thesis, check out the other works.

Description

The supported time series are daily time series, which means that the dates are always days.

Purpose

This library is a tool for time series modeling. In particular, it is an auxiliary utility for helping building machine learning models for time series forecasting.

In fact, the main application of this library is to, given a time series dataset, add some useful and interesting time-related features to it. In other words, it allows the user to extract and build some important time-related explanatory features.

These new features are obtained from a specific and already existing feature of the dataset, by selecting, grouping and processing the days which are somehow related to the ones in the given dataset. As a result, each of these new computed features is an indicator of the behaviour of the specific feature but in other related days.

For example, given a time series dataset and specifying a certain feature, it is possible to add some new features representing the specified feature but in the previous days. Each new feature indicates the value of the specified feature in a certain previous day.

The interfaces of the functionalities of the library are simple and intuitive, but they are also rich. In this way, the user is able to personalize the time series operations in a powerful and flexible way.

Functionalities

There are three groups of functionalities.

The first group is able to manipulate dates (i.e. days). There are several different operations. For example, one of them is able to split a collection of days by a certain criterion, which can either be year, month or season. These functionalities are mainly built in order to be some auxiliary utilities for the other functionalities.

The second group is able to plot time series values. The user can specify several different options, in order to change the visualization and the division of the values. This can be particularly useful for understanding some time-related patterns, like seasonal behaviours.

The third group of functionalities is the most important. These are the processing functionalities, i.e. the ones which actually process the time series datasets. As described above, the main purpose of these functionalities is to extract and build interesting time-related explanatory features.

Implementation details

This library is built on top of the pandas library. The pandas built-in data types are indeed used.

  • The dates are represented with the pd.Timestamp type.
  • Vectors of dates are represented with the pd.DatetimeIndex type.
  • The time series datasets are represented as pd.DataFrame indexed by dates (i.e. the index is a pd.DatetimeIndex). In addition, several pandas utilities and methods are used.

Each processing functionality of timeSeries-processing adds the new extracted features to the given dataset by producing a new dataset, i.e. the given dataset is not modified. In addition, each processing functionality also returns two NumPy arrays: the first is X, which contains the explanatory features of the returned dataset; the second is y, which contains the response feature of the returned dataset. In other words, each of these functionalities automatically splits the obtained dataset into the features used to make the predictions and the feature which is the target of the prediction. This can be particularly useful to easily build and evaluate different machine learning models in a compact way.

To conclude, the time series plotting is built on top of the Matplotlib library.

Installation

Use the package manager pip to install timeSeries-processing.

pip install timeSeries-processing

Main usage

import timeSeries_processing as tsp

# Add, to the time series DataFrame ts_df, features containing values of the specified column "col" but related to the 7
# previous days.
ts_df_new, X, y = tsp.add_k_previous_days(ts_df, col_name="col", k=7)

# Add, to the time series DataFrame ts_df, statistics computed on the other given time series DataFrame ts_df_last_year, but
# with respect to the days of the previous year.
ts_df_new, X, y = tsp.add_k_years_ago_statistics(ts_df, ts_df_last_year, k=1)

# Add, to the time series DataFrame ts_df, statistics computed on the other given time series DataFrame ts_df_curr_year, with
# respect to the preceding days of the same year.
ts_df_new, X, y = tsp.add_current_year_statistics(ts_df, ts_df_curr_year)

# Add, to the time series DataFrame ts_df, statistics computed on the other given time series DataFrame ts_df_3_years_ago,
# but with respect to the days of up to 3 years ago.
ts_df_new, X, y = tsp.add_upTo_k_years_ago_statistics(ts_df, ts_df_3_years_ago, k=3)

References

  • matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
  • pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
  • sklearn, machine Learning in Python.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

timeSeries_processing-1.0.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file timeSeries_processing-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: timeSeries_processing-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for timeSeries_processing-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31e991cb6f51c8bf2e479fdfa1f22564a706dae8b4bde2fd2b4f45c6b39122dc
MD5 240db8c1d114b3210618b86e2e93bb4c
BLAKE2b-256 66a35eca44221ef92bcbae37f2c55c4e51e882f792aeb2726744b9a053d0dd75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page