A codebook solution for time series data compression and feature extraction considering rebound effect
Project description
fdcodepy
Introduction
-
FD_codepy is an open-source python package that can be used to extract time series in an interpretable manner, and use it for compression.
-
The key idea is proposed specifically for metered data in energy sector, but can also be used with smart sensors and edge computing.
-
Inspired by Codebook method, it breaks down the time series data into its constituent parts, i.e., the unique sub-patterns called Codewords, and the index of the Codewords, i.e., representations, allowing for efficient compression and analysis.
-
Compared to resampling data into lower resolution, this lossy compression method takes similar data storage and transmission bandwidth, while preserving high frequency information and accumulative/average metered values.
-
Get a high level idea of the problem from our article published by The Conversation:
-
The FD_codepy source code is on GitHub: https://github.com/abc123yuanrui/FD_codepy/
- An example is provided as notebook: FD_codepy\examples\Codebook_processing_for_energy_porfile.ipynb
Key method for time series compression
Codebook: key class for reconstructing long energy time series into unique partitions (codewords) and representations. Check examples for details.- It takes time series, window size, and distance metric types as inputs.
- Four ensambled distance methods are:
- Euclidean Distance (default or
'euclidean') - Dynamic Time Warping (
'DTW') - Wasserstein Distance (
'Wasserstein') - Flexibility Distance (
'flexibilityD')
- Euclidean Distance (default or
preprocessingmethod will normalise the data into normalised series as attributenormalized_arr, with the scaler attributescaler_averageget_distance_matrixmethod is a statistical analysis that computes the distance matrix for long time series (assuming we know historical data). It returns the matrix and quantile result for setting a similarity threshold (otherwise, the threshold can be set by an empirical value).desolve_time_series_threprocess the time seires intocodewordsandrepresentations, return them.post_processingreconstructe time series based on codewords and representations, the result stores as attributerecovered_series
- Flexbility distance: a novel distance metric that measures the similarity between time series data while taking into account both temporal and amplitude distance, and the rebound effect of the data.
Codebook.flex_distanceis a static function for getting FD between the two given time series.- The default usage case is
fd = Codebook.flex_distance(series_a, series_b)which provides fd by default settings - Get addtional routing inforamtion by
fd, row_index, col_index = Codebook.flex_distance(series_a, series_b, route = True). It probvides the reshaping strage from series a to series b - User can also customise the weighted matrix by
fd = Codebook.flex_distance(series_a, series_b, weighted_matix_a, weighted_matix_b, route = False)
- The default usage case is
- Given any two time series, user can generate a sample dashboard with the default settings following:
from fdcodepy.utils.helpers import distance_method_routing_analysisdistance_method_routing_analysis(series_a, series_b, methods.Code_book, report = True)- The report is gereated in working directorary, user can change it by modifying
export_dirvariable of the functiondistance_method_routing_analysis
Installation
- Install using pip:
pip install fdcodepy
Usage
- Import the package:
import fdcodepy - Step by step example: Codebook processing for a given hourly time series and window size of 24, which decides the compression ratio (same with resmpling data from hourly into daily)
from fdcodepy import methodssample_series = np.random.uniform(0, 30, 365*24)series_codebook = methods.Code_book(time_series, 24, 'flexibilityD')series_codebook.pre_processing()distance_matrix, quantiles = series_codebook.get_distance_matrix()codewords, representations = series_codebook.desolve_time_series_thre(quantiles[0])series_codebook.post_processing()series_codebook.recovered_seriesis the reconstruted data, computed from the representations with only lenght oflen(representations), compared to original series with lengthlen(sample_series)
- The representations are the length of data needs to be communicated to data center, which is equal to the size of downsampled data, in this case, 365
- Use the Codebook processing result to generate a report
from fdcodepy.utils.helpers import code_book_processing_analysiscode_book_processing_analysis(series_codebook, time_index, report = True, export_dir = '.')
- Use the
FlexibilityDistanceto compute the flexibility distance between two time series datasets (with default settings).from fdcodepy import methodsCode_book.flex_distance(time_series_1, time_series_2)
Example analysis report
The figures can be zoomed to for checking details
Reference
- Yuan, R., Pourmousavi, S. A., Soong, W. L., Black, A. J., Liisberg, J. A. R., & Lemos-Vinasco, J. (2024). Unleashing the benefits of smart grids by overcoming the challenges associated with low-resolution data. Cell Reports Physical Science, 5(2), 101830. https://doi.org/10.1016/j.xcrp.2024.101830
- Yuan, S. A. Pourmousavi, W. L. Soong, A. J. Black, J. A. R. Liisberg, and J. Lemos-Vinasco, “A New Time Series Similarity Measure and Its Smart Grid Applications,” 2023. https://arxiv.org/abs/2310.12399
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fdcodepy-0.1.1.tar.gz.
File metadata
- Download URL: fdcodepy-0.1.1.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd24e24ebb56154aee9743c9762bc93114f1662dead977bffe1a3ce12329364b
|
|
| MD5 |
090efe1b42f7167e5270c70d7456f2c0
|
|
| BLAKE2b-256 |
67577d68cc3ea01add6b585b12b6ff8b33141f58031e59eadccbdc5efa30ff7a
|
File details
Details for the file fdcodepy-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fdcodepy-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3091a8d516ea557c745cef7ff8e521447807bf9e71129752910d6ad473ccb334
|
|
| MD5 |
b877138320dc8bade8ded67150b00f0a
|
|
| BLAKE2b-256 |
74bc4ac368e38adb7fe0840b7355ee5b0ce2cda7aa5b6b1ce3131fdbac310f5e
|