Skip to main content

A codebook solution for time series data compression and feature extraction considering rebound effect

Project description

fdcodepy

Introduction

  • FD_codepy is an open-source python package that can be used to extract time series in an interpretable manner, and use it for compression.

  • The key idea is proposed specifically for metered data in energy sector, but can also be used with smart sensors and edge computing.

  • Inspired by Codebook method, it breaks down the time series data into its constituent parts, i.e., the unique sub-patterns called Codewords, and the index of the Codewords, i.e., representations, allowing for efficient compression and analysis.

  • Compared to resampling data into lower resolution, this lossy compression method takes similar data storage and transmission bandwidth, while preserving high frequency information and accumulative/average metered values.

  • Get a high level idea of the problem from our article published by The Conversation:

  • The FD_codepy source code is on GitHub: https://github.com/abc123yuanrui/FD_codepy/

    • An example is provided as notebook: FD_codepy\examples\Codebook_processing_for_energy_porfile.ipynb

Key method for time series compression

  • Codebook: key class for reconstructing long energy time series into unique partitions (codewords) and representations. Check examples for details.
    • It takes time series, window size, and distance metric types as inputs.
    • Four ensambled distance methods are:
      • Euclidean Distance (default or 'euclidean')
      • Dynamic Time Warping ('DTW')
      • Wasserstein Distance ('Wasserstein')
      • Flexibility Distance ('flexibilityD')
    • preprocessing method will normalise the data into normalised series as attribute normalized_arr, with the scaler attribute scaler_average
    • get_distance_matrix method is a statistical analysis that computes the distance matrix for long time series (assuming we know historical data). It returns the matrix and quantile result for setting a similarity threshold (otherwise, the threshold can be set by an empirical value).
    • desolve_time_series_thre process the time seires into codewords and representations, return them.
    • post_processing reconstructe time series based on codewords and representations, the result stores as attribute recovered_series
  • Flexbility distance: a novel distance metric that measures the similarity between time series data while taking into account both temporal and amplitude distance, and the rebound effect of the data.
    • Codebook.flex_distance is a static function for getting FD between the two given time series.
      • The default usage case is fd = Codebook.flex_distance(series_a, series_b) which provides fd by default settings
      • Get addtional routing inforamtion by fd, row_index, col_index = Codebook.flex_distance(series_a, series_b, route = True). It probvides the reshaping strage from series a to series b
      • User can also customise the weighted matrix by fd = Codebook.flex_distance(series_a, series_b, weighted_matix_a, weighted_matix_b, route = False)
    • Given any two time series, user can generate a sample dashboard with the default settings following:
      • from fdcodepy.utils.helpers import distance_method_routing_analysis
      • distance_method_routing_analysis(series_a, series_b, methods.Code_book, report = True)
      • The report is gereated in working directorary, user can change it by modifying export_dir variable of the function distance_method_routing_analysis

Installation

  • Install using pip: pip install fdcodepy

Usage

  • Import the package: import fdcodepy
  • Step by step example: Codebook processing for a given hourly time series and window size of 24, which decides the compression ratio (same with resmpling data from hourly into daily)
    • from fdcodepy import methods
    • sample_series = np.random.uniform(0, 30, 365*24)
    • series_codebook = methods.Code_book(time_series, 24, 'flexibilityD')
    • series_codebook.pre_processing()
    • distance_matrix, quantiles = series_codebook.get_distance_matrix()
    • codewords, representations = series_codebook.desolve_time_series_thre(quantiles[0])
    • series_codebook.post_processing()
    • series_codebook.recovered_series is the reconstruted data, computed from the representations with only lenght of len(representations), compared to original series with length len(sample_series)
  • The representations are the length of data needs to be communicated to data center, which is equal to the size of downsampled data, in this case, 365
  • Use the Codebook processing result to generate a report
    • from fdcodepy.utils.helpers import code_book_processing_analysis
    • code_book_processing_analysis(series_codebook, time_index, report = True, export_dir = '.')
  • Use the FlexibilityDistance to compute the flexibility distance between two time series datasets (with default settings).
    • from fdcodepy import methods
    • Code_book.flex_distance(time_series_1, time_series_2)

Example analysis report

Figure 1

The figures can be zoomed to for checking details Figure 2

Reference

  • Yuan, R., Pourmousavi, S. A., Soong, W. L., Black, A. J., Liisberg, J. A. R., & Lemos-Vinasco, J. (2024). Unleashing the benefits of smart grids by overcoming the challenges associated with low-resolution data. Cell Reports Physical Science, 5(2), 101830. https://doi.org/10.1016/j.xcrp.2024.101830
  • Yuan, S. A. Pourmousavi, W. L. Soong, A. J. Black, J. A. R. Liisberg, and J. Lemos-Vinasco, “A New Time Series Similarity Measure and Its Smart Grid Applications,” 2023. https://arxiv.org/abs/2310.12399

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fdcodepy-0.1.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fdcodepy-0.1.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file fdcodepy-0.1.1.tar.gz.

File metadata

  • Download URL: fdcodepy-0.1.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for fdcodepy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bd24e24ebb56154aee9743c9762bc93114f1662dead977bffe1a3ce12329364b
MD5 090efe1b42f7167e5270c70d7456f2c0
BLAKE2b-256 67577d68cc3ea01add6b585b12b6ff8b33141f58031e59eadccbdc5efa30ff7a

See more details on using hashes here.

File details

Details for the file fdcodepy-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fdcodepy-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for fdcodepy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3091a8d516ea557c745cef7ff8e521447807bf9e71129752910d6ad473ccb334
MD5 b877138320dc8bade8ded67150b00f0a
BLAKE2b-256 74bc4ac368e38adb7fe0840b7355ee5b0ce2cda7aa5b6b1ce3131fdbac310f5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page