Python Package for automated multivariate Time Series imputation

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

GitHub release (latest by date including pre-releases) GitHub last commit GitHub pull requests GitHub contributors codesize

TimeWeaver: Automated time series imputation

TimeWeaver is a Python library designed for multivariate time series data analysis, specifically addressing the challenges of machine process environmental data. It focuses on overcoming incomplete datasets due to sensor errors by employing various tailored imputation techniques. This ensures the integrity and relevance of data, catering to the unique characteristics of different features, such as discrepancies between power consumption and temperature curves.

TimeWeaver provides insightful graphics and analyses, enabling effective tool selection for specific data challenges, making it a valuable asset for data scientists and analysts. Additionally, it is evolving to offer a customizable Preprocessor model, facilitating the integration of optimal imputation methods into existing data processing pipelines for automated and enhanced data preparation.

Disclaimer

The currently implementation methods are based on the provided functions by numpy / scipy. The package logo was generated by ChatGPT 4.0 on 09.03.2024. The project is still in the early stages of development!

Quickstart

The following example uses the Beijing PM25 Data Set to show the functionalities of the library.

from timeweaver.timeweaver import TimeWeaver
from timeweaver.datasets import DataSets

dataframe = DataSets.PRSA()
interpolator = TimeWeaver(dataframe[0:1000], tracking_column="No")
interpolator.evaluate()
print(interpolator.get_best(optimized_selection=True))

[('year', 'akima'), ('month', 'akima'), ('day', 'akima'), ('hour', 'akima'), ('pm2.5', 'akima'), ('DEWP', 'from_derivatives'), ('TEMP', 'akima'), ('PRES', 'akima'), ('Iws', 'akima'), ('Is', 'from_derivatives'), ('Ir', 'akima')]

Structure

Functionalities

import pandas as pd
from timeweaver import TimeWeaver

dataframe = pd.read_csv("./src/data/PRSA/PRSA_data_2010.1.1-2014.12.31.csv")
dataframe

No	year	month	day	hour	pm2.5	DEWP	TEMP	PRES	cbwd	Iws
1	2010	1	1	0	nan	-21	-11	1021	NW	1.79
2	2010	1	1	1	nan	-21	-12	1020	NW	4.92
3	2010	1	1	2	nan	-21	-11	1019	NW	6.71
4	2010	1	1	3	nan	-21	-14	1019	NW	9.84
5	2010	1	1	4	nan	-20	-12	1018	NW	12.97

TimeWeaver PM25 *Filtered for the relevant columns "pm2.5", "DEWP", "TEMP", "PRES" and "Iws"

Initialize the TimeWeaver object and provide the dataframe and the tracking colum (Time or Index)

interpolator = TimeWeaver(dataframe[0:1000], tracking_column="No")

.get_summary() provides an general overview over the data. In this case we can identify that within the dataframe the features "pm2.5" utilizes NaNs and the "cbwd" is entirely consiting of non-numeric values.

interpolator.get_summary()

Column	Data Type	Total Numeric Cells	Total Non-Numeric Cells	Total NaNs	Total Zero Values
No	int64	1000	0	0	0
year	int64	1000	0	0	0
month	int64	1000	0	0	0
day	int64	1000	0	0	0
hour	int64	1000	0	0	42
pm2.5	float64	909	0	91	0
DEWP	int64	1000	0	0	0
TEMP	float64	1000	0	0	48
PRES	float64	1000	0	0	0
cbwd	object	0	1000	0	0
Iws	float64	1000	0	0	0
Is	int64	1000	0	0	958
Ir	int64	1000	0	0	1000

.get_summary(full_summary=True) provides an more detailed overview.

interpolator.get_summary(full_summary=True)

Column	Data Type	Total Numeric Cells	Total Non-Numeric Cells	Total NaNs	Total Zero Values	Unique Values Count	Most Frequent Value	Minimum Value	Maximum Value	Mean	Median
No	int64	1000	0	0	0	1000	1	1	1000	500.5	500.5
year	int64	1000	0	0	0	1	2010	2010	2010	2010	2010
month	int64	1000	0	0	0	2	1	1	2	1.256	1
day	int64	1000	0	0	0	31	1	1	31	13.4	11
hour	int64	1000	0	0	42	24	0	0	23	11.436	11
pm2.5	float64	909	0	91	0	257	27.0	6	485	88.363	61
DEWP	int64	1000	0	0	0	26	-19	-27	-2	-16.269	-17
TEMP	float64	1000	0	0	48	28	-5.0	-19	8	-5.483	-5
PRES	float64	1000	0	0	0	28	1027.0	1012	1039	1027.82	1028
cbwd	object	0	1000	0	0	4	NW	nan	nan	nan	nan
Iws	float64	1000	0	0	0	398	0.89	0.45	299.06	34.1421	9.84
Is	int64	1000	0	0	958	26	0	0	27	0.414	0
Ir	int64	1000	0	0	1000	1	0	0	0	0	0

.get_summary_characters() provides insights into individual characters within the data.

interpolator.get_summary(full_summary=True)

Column	Data Type	Total Non-Numeric	Total Numeric	n	a	.	-	N	W	c	v	E	S
No	int64	0	2893	0	0	0	0	0	0	0	0	0	0
year	int64	0	4000	0	0	0	0	0	0	0	0	0	0
month	int64	0	1000	0	0	0	0	0	0	0	0	0	0
day	int64	0	1568	0	0	0	0	0	0	0	0	0	0
hour	int64	0	1580	0	0	0	0	0	0	0	0	0	0
pm2.5	float64	1182	2982	182	91	909	0	0	0	0	0	0	0
DEWP	int64	1000	1845	0	0	0	1000	0	0	0	0	0	0
TEMP	float64	1821	2246	0	0	1000	821	0	0	0	0	0	0
PRES	float64	1000	5000	0	0	1000	0	0	0	0	0	0	0
cbwd	object	2000	0	0	0	0	0	690	528	155	155	317	155
Iws	float64	1000	3532	0	0	1000	0	0	0	0	0	0	0
Is	int64	0	1017	0	0	0	0	0	0	0	0	0	0
Ir	int64	0	1000	0	0	0	0	0	0	0	0	0	0

.evaluate() is the core function of TimeWeaver and tests the different methods on the dataframe.

interpolator.evaluate()

âž¤ Evaluation complete. âœ”
âž¤ Evaluated number of methods: 21 âœ”

If the evaluation is done the user can access differrent methods to retrieve the analysis resuslts and gain more insights into the imputation results.

results_df = interpolator.get_evaluation_dataframe()
results_df

	linear	nearest	zero	slinear	quadratic
year	0	0	0	0	0
month	0	0	0	0	0.0001
day	0.045	0.07	0.06	0.045	0.0741
hour	1.08	2.54	2.44	1.08	1.7227
pm2.5	10.412	13.1011	12.3483	10.412	10.6714
DEWP	0.6783	0.88	0.86	0.6783	0.7843
TEMP	0.72	1.03	1.17	0.72	0.8513
PRES	0.395	0.55	0.53	0.395	0.4227
cbwd	nan	nan	nan	nan	nan
Iws	2.972	6.2409	5.6463	2.972	4.3372
Is	0.025	0.09	0.09	0.025	0.0352
Ir	0	0	0	0	0

results_df = interpolator.get_method_success_dataframe()
results_df

	linear	nearest	zero	slinear	quadratic
year	1	1	1	1	1
month	1	1	1	1	1
day	1	1	1	1	1
hour	1	1	1	1	1
pm2.5	1	1	1	1	1
DEWP	1	1	1	1	1
TEMP	1	1	1	1	1
PRES	1	1	1	1	1
cbwd	0	0	0	0	0
Iws	1	1	1	1	1
Is	1	1	1	1	1
Ir	1	1	1	1	1

results_df = interpolator.get_best(optimized_selection=False)
results_df

{'year': ['linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'polynomial_order_1', 'polynomial_order_2', 'polynomial_order_3', 'polynomial_order_5', 'polynomial_order_7', 'polynomial_order_9', 'piecewise_polynomial', 'spline_order_1', 'spline_order_2', 'spline_order_3', 'spline_order_4', 'spline_order_5', 'akima', 'cubicspline', 'from_derivatives'], 'month': ['linear', 'nearest', 'zero', 'slinear', 'polynomial_order_1', 'piecewise_polynomial', 'akima', 'from_derivatives'], 'day': ['linear', 'slinear', 'polynomial_order_1', 'piecewise_polynomial', 'akima', 'from_derivatives'], 'hour': ['linear', 'slinear', 'polynomial_order_1', 'piecewise_polynomial', 'akima', 'from_derivatives'], 'pm2.5': ['akima'], 'DEWP': ['linear', 'slinear', 'polynomial_order_1', 'piecewise_polynomial', 'from_derivatives'], 'TEMP': ['akima'], 'PRES': ['akima'], 'cbwd': [], 'Iws': ['akima'], 'Is': ['linear', 'slinear', 'polynomial_order_1', 'piecewise_polynomial', 'from_derivatives'], 'Ir': ['linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'polynomial_order_1', 'polynomial_order_2', 'polynomial_order_3', 'polynomial_order_5', 'polynomial_order_7', 'polynomial_order_9', 'piecewise_polynomial', 'spline_order_1', 'spline_order_2', 'spline_order_3', 'spline_order_4', 'spline_order_5', 'akima', 'cubicspline', 'from_derivatives']}

results_df = interpolator.get_best(optimized_selection=True)
results_df

interpolator.get_rate_analysis()

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.7.31

Mar 30, 2024

0.1.7.30

Mar 30, 2024

0.1.7.29

Mar 30, 2024

0.1.7.28

Mar 30, 2024

0.1.7.27

Mar 18, 2024

0.1.7.26

Mar 18, 2024

0.1.7.25

Mar 16, 2024

This version

0.1.7.24

Mar 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TimeWeaver-0.1.7.24.tar.gz (518.7 kB view details)

Uploaded Mar 15, 2024 Source

File details

Details for the file TimeWeaver-0.1.7.24.tar.gz.

File metadata

Download URL: TimeWeaver-0.1.7.24.tar.gz
Upload date: Mar 15, 2024
Size: 518.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for TimeWeaver-0.1.7.24.tar.gz
Algorithm	Hash digest
SHA256	`4a4b40d20725fac678bce5e988fbb76f0d2e4a06c69f89503475f7afb706e4ad`
MD5	`37d75e63ac2da8440fbe752cc6e33d12`
BLAKE2b-256	`596168a02b2e73aba3981257d73b082f85b6804aeb6bdb02a4c64ef0d6981b11`

See more details on using hashes here.

TimeWeaver 0.1.7.24

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TimeWeaver: Automated time series imputation

Disclaimer

Quickstart

Structure

Functionalities

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes