A Python library for pre-processing ubiquitous aggregated self-tracking data

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Operating System
- OS Independent
Programming Language
- Python :: 3.8

Project description

UBIWEAR

A Python library for pre-processing ubiquitous aggregated self-tracking data.

What is this library about

This library is influenced by the work of ours in which we utilized in-the-wild data coming from the "MyHeart Counts" study [1].

Through our time-consuming experimentation with these real-world data, we extracted a set of prescriptive guidelines of pre-processing steps related to aggregated data gathered from wearable devices.

We hope UBIWEAR serves as a starting point to the research community towards the unexplored domain of physical activity prediction and promote a standardized definition for pre-processing wearables and self-tracking devices data.

When to use this library

To the best of our knowledge since this library was written, there were no suggested techniques to apply for handling time-series data coming from self-tracking devices.

In UBIWEAR we offer some pre-processing methods related to univariate time-series problems with some slight modifications exclusively for wearables data.

It handles univariate time-series aggregated data and process the data in a structure for predictive modeling.

Usage of UBIWEAR

Install the library

Create virtual environment

$ python3 -m venv venv
$ source venv/bin/activate

Upgrade pip

$ python -m pip install --upgrade pip

Install UBIWEAR

$ pip install ubiwear

Load your data

The input to UBIWEAR is always a pandas' DataFrame with the index as type of DatetimeIndex and a column named value of type float or int with the recorded observations representing your time-series data.

For comprehension reasons we included an example of such data in the assets/ directory in .csv format.

import pandas as pd

df = pd.read_csv('assets/df-wearable-time-series-example.csv', index_col='startTime', parse_dates=True)

The df must have the following format like in the example:

                     value
startTime                 
2015-08-07 05:37:31   59.0
2015-08-07 05:43:31  139.0
2015-08-07 07:06:16  245.0
2015-08-07 07:11:18  148.0
2015-08-07 07:15:49   43.0
                    ...
2015-08-25 04:52:35   18.0
2015-08-25 05:03:11   15.0
2015-08-25 05:04:51   44.0
2015-08-25 05:06:13   80.0
2015-08-25 05:41:19  112.0

Clean and process the data

Import the Processor class. Its' purpose is to pre-process time-series aggregated wearable data.

The available methods of the class should be used in a chaining style.

It also offers a "magic" method process that processes the data in a pre-defined suggested pipeline, that works especially for physical activity data.

from ubiwear.processor import Processor

ubiwear_processor = Processor(df=df)

# Call the magic method
df = ubiwear_processor.process(granularity='1H', q=0.05, impute_start=8, impute_end=24)

The df has the following format:

                          value  dayofweek_sin  ...  hour_sin      hour_cos
startTime                                       ...                        
2015-08-07 05:00:00  198.000000      -0.433884  ...  0.965926  2.588190e-01
2015-08-07 06:00:00    0.000000      -0.433884  ...  1.000000  6.123234e-17
2015-08-07 07:00:00  467.000000      -0.433884  ...  0.965926 -2.588190e-01
2015-08-07 08:00:00  544.333333      -0.433884  ...  0.866025 -5.000000e-01
2015-08-07 09:00:00  621.666667      -0.433884  ...  0.707107 -7.071068e-01
                         ...            ...  ...       ...           ...
2015-08-25 01:00:00    0.000000       0.781831  ...  0.258819  9.659258e-01
2015-08-25 02:00:00   82.000000       0.781831  ...  0.500000  8.660254e-01
2015-08-25 03:00:00    0.000000       0.781831  ...  0.707107  7.071068e-01
2015-08-25 04:00:00    0.000000       0.781831  ...  0.866025  5.000000e-01
2015-08-25 05:00:00   95.000000       0.781831  ...  0.965926  2.588190e-01

What has happened ?

removed duplicate observations related to time-series examples.
removed NaN/NaT records
removed outlier values using the quantiles method
resampled the data in a unified granularity i.e. hourly granularity
imputed specifically for wearables' data missing values on active hours (08:00 - 24:00)
enhanced feature space with date features and converted them into their cyclical transformation

All of the above methods can be called individually and select those that fit your problem.

You can also implement your own methods in Processor class and call it in your desired pre-processing pipeline in a chaining manner.

For example:

from ubiwear.processor import Processor

ubiwear_processor = Processor(df=df)

ubiwear_processor \
    .remove_nan() \
    .remove_duplicate_values_at_same_timestamp() \
    .add_date_features() \
    # ... \    
    # your_own_method()

# Get the processed data
df = ubiwear_processor.df

Re-frame the problem from time-series to a supervised dataset

Use the Window class which provides two main functionalities that transforms a time-series problem to a supervised set ready to be used by machine learning algorithms.

Sliding window to transform a time-series problem to a supervised
Our novel aggregated tumbling window

from ubiwear.window import Window

# Transform from time-series to supervised dataset for ML
window = Window(n_in=2 * 24)
dataset = window.sliding_window(data=df)

# OR aggregated tumbling window
# dataset = window.tumbling_window(data=df, freq='1D')

The dataset has the following format:

                     var1(t-48)  var2(t-48)  ...  var11(t)  var1(t)
startTime                                    ...                   
2015-08-09 05:00:00       198.0   -0.433884  ...  0.258819      0.0
2015-08-10 05:00:00         0.0   -0.974928  ...  0.258819      0.0
                                                    ...
2015-08-11 05:00:00         0.0   -0.781831  ...  0.258819      0.0
2015-08-22 05:00:00         0.0    0.433884  ...  0.258819      0.0
2015-08-23 05:00:00         0.0   -0.433884  ...  0.258819   4562.0
2015-08-24 05:00:00         0.0   -0.974928  ...  0.258819   1861.5
2015-08-25 05:00:00       450.0   -0.781831  ...  0.258819    177.0

Convert dataset for ML

The Dataset is a class that provides sub-datasets for training ML models. It takes as input the dataset created from the UBIWEAR's Window class.

from ubiwear.dataset import Dataset

ubiwear_dataset = Dataset(dataset=dataset)

# Get train/test sub-datasets
x_train, x_test, y_train, y_test = ubiwear_dataset.get_train_test(train_ratio=0.75)

# OR train/validation/test sub-datasets
x_train, x_val, x_test, y_train, y_val, y_test = ubiwear_dataset.get_train_val_test(train_ratio=0.75, val_ratio=0.2)

Apply your favorite ML or DL model

You know have clean, pre-processed and ready your well-known X's and y's for your ML problem!

You can call your favorite model, and record the performance on your favorite regression metrics.

Literature

[1] Hershman, Steven G., et al. "Physical activity, sleep and cardiovascular health data for 50,000 individuals from the MyHeart Counts Study." Scientific data 6.1 (2019): 1-10.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Operating System
- OS Independent
Programming Language
- Python :: 3.8

Release history Release notifications | RSS feed

0.0.16 yanked

Jan 7, 2022

Reason this release was yanked:

Wrong versioning

This version

0.0.2

Jan 7, 2022

0.0.1

Jan 7, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ubiwear-0.0.2.tar.gz (11.5 kB view details)

Uploaded Jan 7, 2022 Source

Built Distribution

ubiwear-0.0.2-py3-none-any.whl (9.9 kB view details)

Uploaded Jan 7, 2022 Python 3

File details

Details for the file ubiwear-0.0.2.tar.gz.

File metadata

Download URL: ubiwear-0.0.2.tar.gz
Upload date: Jan 7, 2022
Size: 11.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for ubiwear-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`7ab298a12ba9694cba1a90388d3adbd28a2fa597a99cb81c28b5559780d32c7b`
MD5	`abaad32730c98132d29a3e44437a1610`
BLAKE2b-256	`4a241f8fc09656e75fca860192f4be465f60c2b6d0233e408cbd63006248cccd`

See more details on using hashes here.

File details

Details for the file ubiwear-0.0.2-py3-none-any.whl.

File metadata

Download URL: ubiwear-0.0.2-py3-none-any.whl
Upload date: Jan 7, 2022
Size: 9.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for ubiwear-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5cf28046c00aecaa672950b92ef5355453a519b31ad1aac2adac1ad354172837`
MD5	`b8a2278f73113755f40770e8559800fe`
BLAKE2b-256	`def3fbfe30641fb8d05befbdfd61ca1860e29047441ad1f455cd4155e8e49179`

See more details on using hashes here.

ubiwear 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

UBIWEAR

What is this library about

When to use this library

Usage of UBIWEAR

Install the library

Load your data

Clean and process the data

Re-frame the problem from time-series to a supervised dataset

Convert dataset for ML

Apply your favorite ML or DL model

Literature

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes