timeseriesqualitycheckis a Python package designed to assess the quality of time-series data. It provides a straightforward way to evaluate the integrity and cleanliness of time-series datasets by analyzing their Time Pattern Cohesion Score (TPCS) and Signal-to-Noise Ratio (SNR)
Project description
timeseriesqualitycheck
timeseriesqualitycheck is a Python package designed to assess the quality of time-series data. It provides a straightforward way to evaluate the integrity and cleanliness of time-series datasets by analyzing their Time Pattern Cohesion Score (TPCS) and Signal-to-Noise Ratio (SNR).
Installation
To install timeseriesqualitycheck, simply use pip:
pip install timeseriesqualitycheck
check_quality Function
-
check_qualityrequires a signal input in pandas DataFrame format, with at least two columns:y: column where the values of the signal are stored.ds: column where the date information is stored.
-
The
END_OF_TIMEparameter is useful when we have extra information about the time period for the signal. For example, the signal may have values until May but should also include values for April. TheEND_OF_TIMEparameter helps determine possible missing values. -
The
MAX_LEN_MONTHSparameter works in a similar way to theEND_OF_TIMEparameter. However, its purpose is to gauge the existence of missing values from the beginning of the defined data gathering period. -
The function returns a dictionary of values:
cleaning_score_data_dict = { "TPC_features": TPC_features, "TPC_score": TPC_score, "SNR_features": SNR_features, "SNR_score": SNR_score, "cleaning_score_weights": cleaning_score_weights, "cleaning_score": cleaning_score, }
You can access any value you need; the final output key is
"cleaning_score".
Usage
import pandas as pd
from timeseriesqualitycheck import check_quality
END_OF_TIME= pd.to_datetime("2021-05-01")
MAX_LEN_MONTHS=13
list_of_timestamps= ["2020-05-01","2020-06-01","2020-07-01","2020-08-01","2020-09-01","2020-10-01","2020-11-01","2020-12-01","2021-01-01", "2021-03-01","2021-04-01","2021-05-01" ]
#notice that "2021-02-01" is missing
list_of_timestamps= [pd.to_datetime(e) for e in list_of_timestamps]
signal_values_for_timestamps=[20,30,40, 50,600, 70, 80, 70, 60, 50,40, 30 ]
#notice that we have an outlier(600) value
dict = {'ds': list_of_timestamps, 'y': signal_values_for_timestamps}
df = pd.DataFrame(dict)
quality_report = check_quality(df, MAX_LEN_MONTHS, END_OF_TIME)
print(quality_report)
Description
The check_quality function evaluates the quality of a time-series signal. It analyzes the signal for pattern consistency, contiguity, and noise levels to produce a comprehensive quality score.
Syntax
timeseriesqualitycheck.check_quality(signal_df, MAX_LEN_MONTHS, END_OF_TIME, tpcs_limit_for_snr_calculationt=3.5)
Parameters
- signal_df (
pd.DataFrame): A pandas DataFrame containing the time-series data with 'y' and 'ds' columns. - MAX_LEN_MONTHS (
int): The maximum length of the time series in months. - END_OF_TIME (
datetime): The end date for the time series data. - snr_limit (
float, optional): The threshold for the signal-to-noise ratio. Default is 3.5.
Returns
- dict: A dictionary containing the cleanliness score, TPC and SNR features, and detailed scores.
Contributing
Contributions to timeseriesqualitycheck are welcome. Please ensure that your code adheres to the project's coding standards and includes appropriate tests.
License
This project is licensed under the MIT License.
Additional Notes on check_quality Function:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file timeseriesqualitycheck-0.0.3.tar.gz.
File metadata
- Download URL: timeseriesqualitycheck-0.0.3.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65d06076931eb9d545d8744a4022403110c8a03d2f56c70fcbcbcc508b43d54e
|
|
| MD5 |
7ab5b0cdd2c72c6a839010bf16d45f92
|
|
| BLAKE2b-256 |
8d432f023a8791859502fddd064cceaedea177b29835f69ecd061eb08eeaee80
|
File details
Details for the file timeseriesqualitycheck-0.0.3-py3-none-any.whl.
File metadata
- Download URL: timeseriesqualitycheck-0.0.3-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c91e5b6837966d32af32009cd6a4ea261d81f4cf90ef23fcec8baadae3b5c267
|
|
| MD5 |
d6f76fee5091201f7d56e699db192a25
|
|
| BLAKE2b-256 |
6b56dfad053da481b6415b6a079d5c5c2db38f4a9b489906902d3f8dd53e41d5
|