Tools to Transform a Time Series into Features and Target a.k.a Supervised Learning

These details have not been verified by PyPI

Project links

Homepage

Project description

ts2ml

Install

pip install ts2ml

How to use

import pandas as pd
from ts2ml.core import add_missing_slots
from ts2ml.core import transform_ts_data_into_features_and_target

df = pd.DataFrame({
    'pickup_hour': ['2022-01-01 00:00:00', '2022-01-01 01:00:00', '2022-01-01 03:00:00', '2022-01-01 01:00:00', '2022-01-01 02:00:00', '2022-01-01 05:00:00'],
    'pickup_location_id': [1, 1, 1, 2, 2, 2],
    'rides': [2, 3, 1, 1, 2, 1]
})
df

	pickup_hour	pickup_location_id	rides
0	2022-01-01 00:00:00	1	2
1	2022-01-01 01:00:00	1	3
2	2022-01-01 03:00:00	1	1
3	2022-01-01 01:00:00	2	1
4	2022-01-01 02:00:00	2	2
5	2022-01-01 05:00:00	2	1

Let’s fill the missing slots with zeros

df = add_missing_slots(df, datetime_col='pickup_hour', entity_col='pickup_location_id', value_col='rides', freq='H')
df

100%|██████████| 2/2 [00:00<00:00, 907.86it/s]

	pickup_hour	pickup_location_id	rides
0	2022-01-01 00:00:00	1	2
1	2022-01-01 01:00:00	1	3
2	2022-01-01 02:00:00	1	0
3	2022-01-01 03:00:00	1	1
4	2022-01-01 04:00:00	1	0
5	2022-01-01 05:00:00	1	0
6	2022-01-01 00:00:00	2	0
7	2022-01-01 01:00:00	2	1
8	2022-01-01 02:00:00	2	2
9	2022-01-01 03:00:00	2	0
10	2022-01-01 04:00:00	2	0
11	2022-01-01 05:00:00	2	1

Now, let’s build features and targets to predict the number of rides for the next hour for each location_id, by using the historical number of rides for the last 3 hours

features, targets = transform_ts_data_into_features_and_target(
    df,
    n_features=3,
    datetime_col='pickup_hour', 
    entity_col='pickup_location_id', 
    value_col='rides',
    n_targets=1,
    step_size=1,
    step_name='hour'
)

100%|██████████| 2/2 [00:00<00:00, 597.86it/s]

features

	rides_previous_3_hour	rides_previous_2_hour	rides_previous_1_hour	pickup_hour	pickup_location_id
0	2.0	3.0	0.0	2022-01-01 03:00:00	1
1	3.0	0.0	1.0	2022-01-01 04:00:00	1
2	0.0	1.0	2.0	2022-01-01 03:00:00	2
3	1.0	2.0	0.0	2022-01-01 04:00:00	2

targets

	target_rides_next_hour
0	1.0
1	0.0
2	0.0
3	0.0

Xy_df = pd.concat([features, targets], axis=1)
Xy_df

	rides_previous_3_hour	rides_previous_2_hour	rides_previous_1_hour	pickup_hour	pickup_location_id	target_rides_next_hour
0	2.0	3.0	0.0	2022-01-01 03:00:00	1	1.0
1	3.0	0.0	1.0	2022-01-01 04:00:00	1	0.0
2	0.0	1.0	2.0	2022-01-01 03:00:00	2	0.0
3	1.0	2.0	0.0	2022-01-01 04:00:00	2	0.0

Another Example

Montly spaced time series

import pandas as pd
import numpy as np

# Generate timestamp index with monthly frequency
date_rng = pd.date_range(start='1/1/2020', end='12/1/2022', freq='MS')

# Create list of city codes
cities = ['FOR', 'SP', 'RJ']

# Create dataframe with random sales data for each city on each month
df = pd.DataFrame({
    'date': date_rng,
    'city': np.repeat(cities, len(date_rng)//len(cities)),
    'sales': np.random.randint(1000, 5000, size=len(date_rng))
})
df

	date	city	sales
0	2020-01-01	FOR	4944
1	2020-02-01	FOR	3435
2	2020-03-01	FOR	4543
3	2020-04-01	FOR	3879
4	2020-05-01	FOR	2601
5	2020-06-01	FOR	2922
6	2020-07-01	FOR	4542
7	2020-08-01	FOR	1338
8	2020-09-01	FOR	2938
9	2020-10-01	FOR	2695
10	2020-11-01	FOR	4065
11	2020-12-01	FOR	3864
12	2021-01-01	SP	2652
13	2021-02-01	SP	2137
14	2021-03-01	SP	2663
15	2021-04-01	SP	1168
16	2021-05-01	SP	4523
17	2021-06-01	SP	4135
18	2021-07-01	SP	3566
19	2021-08-01	SP	2121
20	2021-09-01	SP	1070
21	2021-10-01	SP	1624
22	2021-11-01	SP	3034
23	2021-12-01	SP	4063
24	2022-01-01	RJ	2297
25	2022-02-01	RJ	3430
26	2022-03-01	RJ	2903
27	2022-04-01	RJ	4197
28	2022-05-01	RJ	4141
29	2022-06-01	RJ	2899
30	2022-07-01	RJ	4529
31	2022-08-01	RJ	3612
32	2022-09-01	RJ	1856
33	2022-10-01	RJ	4804
34	2022-11-01	RJ	1764
35	2022-12-01	RJ	4425

FOR city only have data for 2020 year, RJ only for 2022 and SP only for 2021. Let’s also simulate more missing slots between the years.

# Generate random indices to drop
drop_indices = np.random.choice(df.index, size=int(len(df)*0.2), replace=False)

# Drop selected rows from dataframe
df = df.drop(drop_indices)
df.reset_index(drop=True, inplace=True)
df

	date	city	sales
0	2020-01-01	FOR	4944
1	2020-02-01	FOR	3435
2	2020-03-01	FOR	4543
3	2020-04-01	FOR	3879
4	2020-05-01	FOR	2601
5	2020-06-01	FOR	2922
6	2020-07-01	FOR	4542
7	2020-08-01	FOR	1338
8	2020-09-01	FOR	2938
9	2020-11-01	FOR	4065
10	2020-12-01	FOR	3864
11	2021-01-01	SP	2652
12	2021-02-01	SP	2137
13	2021-03-01	SP	2663
14	2021-07-01	SP	3566
15	2021-08-01	SP	2121
16	2021-10-01	SP	1624
17	2021-11-01	SP	3034
18	2021-12-01	SP	4063
19	2022-01-01	RJ	2297
20	2022-02-01	RJ	3430
21	2022-03-01	RJ	2903
22	2022-04-01	RJ	4197
23	2022-05-01	RJ	4141
24	2022-06-01	RJ	2899
25	2022-09-01	RJ	1856
26	2022-10-01	RJ	4804
27	2022-11-01	RJ	1764
28	2022-12-01	RJ	4425

Now lets fill the missing slots with zero values. The function will complete the missing slots with zeros:

df_full = add_missing_slots(df, datetime_col='date', entity_col='city', value_col='sales', freq='MS')
df_full

100%|██████████| 3/3 [00:00<00:00, 843.70it/s]

	date	city	sales
0	2020-01-01	FOR	4944
1	2020-02-01	FOR	3435
2	2020-03-01	FOR	4543
3	2020-04-01	FOR	3879
4	2020-05-01	FOR	2601
...	...	...	...
103	2022-08-01	RJ	0
104	2022-09-01	RJ	1856
105	2022-10-01	RJ	4804
106	2022-11-01	RJ	1764
107	2022-12-01	RJ	4425

108 rows × 3 columns

Let’s build a dataset for training a machine learning model to predict the sales for the next 3 months, for each city, based on historical data of sales for the previous 6 months.

features, targets = transform_ts_data_into_features_and_target(
    df_full,
    n_features=3,
    datetime_col='date',
    entity_col='city',
    value_col='sales',
    n_targets=1,
    step_size=1,
    step_name='month'
)

100%|██████████| 3/3 [00:00<00:00, 205.58it/s]

pd.concat([features, targets], axis=1)

	sales_previous_3_month	sales_previous_2_month	sales_previous_1_month	date	city	target_sales_next_month
0	4944.0	3435.0	4543.0	2020-04-01	FOR	3879.0
1	3435.0	4543.0	3879.0	2020-05-01	FOR	2601.0
2	4543.0	3879.0	2601.0	2020-06-01	FOR	2922.0
3	3879.0	2601.0	2922.0	2020-07-01	FOR	4542.0
4	2601.0	2922.0	4542.0	2020-08-01	FOR	1338.0
...	...	...	...	...	...	...
91	4197.0	4141.0	2899.0	2022-07-01	RJ	0.0
92	4141.0	2899.0	0.0	2022-08-01	RJ	0.0
93	2899.0	0.0	0.0	2022-09-01	RJ	1856.0
94	0.0	0.0	1856.0	2022-10-01	RJ	4804.0
95	0.0	1856.0	4804.0	2022-11-01	RJ	1764.0

96 rows × 6 columns

Embedding on Sklearn Pipelines

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer

add_missing_slots_transformer = FunctionTransformer(
    add_missing_slots, 
    kw_args={
        'datetime_col': 'date', 
        'entity_col': 'city', 
        'value_col': 'sales', 
        'freq': 'MS'
    }
)

transform_ts_data_into_features_and_target_transformer = FunctionTransformer(
    transform_ts_data_into_features_and_target, 
    kw_args={
        'n_features': 3, 
        'datetime_col': 'date', 
        'entity_col': 'city', 
        'value_col': 'sales', 
        'n_targets': 1, 
        'step_size': 1, 
        'step_name': 'month',
        'concat_Xy': True
    }
)

ts_data_to_features_and_target_pipeline = make_pipeline(
    add_missing_slots_transformer,
    transform_ts_data_into_features_and_target_transformer
)
ts_data_to_features_and_target_pipeline

Pipeline(steps=[('functiontransformer-1',
             FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,
                                 kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;,
                                          &#x27;entity_col&#x27;: &#x27;city&#x27;,
                                          &#x27;freq&#x27;: &#x27;MS&#x27;,
                                          &#x27;value_col&#x27;: &#x27;sales&#x27;})),
            (&#x27;functiontransformer-2&#x27;,
             FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,
                                 kw_args={&#x27;concat_Xy&#x27;: True,
                                          &#x27;datetime_col&#x27;: &#x27;date&#x27;,
                                          &#x27;entity_col&#x27;: &#x27;city&#x27;,
                                          &#x27;n_features&#x27;: 3, &#x27;n_targets&#x27;: 1,
                                          &#x27;step_name&#x27;: &#x27;month&#x27;,
                                          &#x27;step_size&#x27;: 1,
                                          &#x27;value_col&#x27;: &#x27;sales&#x27;}))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-7" type="checkbox" ><label for="sk-estimator-id-7" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;functiontransformer-1&#x27;,
             FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,
                                 kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;,
                                          &#x27;entity_col&#x27;: &#x27;city&#x27;,
                                          &#x27;freq&#x27;: &#x27;MS&#x27;,
                                          &#x27;value_col&#x27;: &#x27;sales&#x27;})),
            (&#x27;functiontransformer-2&#x27;,
             FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,
                                 kw_args={&#x27;concat_Xy&#x27;: True,
                                          &#x27;datetime_col&#x27;: &#x27;date&#x27;,
                                          &#x27;entity_col&#x27;: &#x27;city&#x27;,
                                          &#x27;n_features&#x27;: 3, &#x27;n_targets&#x27;: 1,
                                          &#x27;step_name&#x27;: &#x27;month&#x27;,
                                          &#x27;step_size&#x27;: 1,
                                          &#x27;value_col&#x27;: &#x27;sales&#x27;}))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-8" type="checkbox" ><label for="sk-estimator-id-8" class="sk-toggleable__label sk-toggleable__label-arrow">FunctionTransformer</label><div class="sk-toggleable__content"><pre>FunctionTransformer(func=&lt;function add_missing_slots at 0x11f8f49d0&gt;,
                kw_args={&#x27;datetime_col&#x27;: &#x27;date&#x27;, &#x27;entity_col&#x27;: &#x27;city&#x27;,
                         &#x27;freq&#x27;: &#x27;MS&#x27;, &#x27;value_col&#x27;: &#x27;sales&#x27;})</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-9" type="checkbox" ><label for="sk-estimator-id-9" class="sk-toggleable__label sk-toggleable__label-arrow">FunctionTransformer</label><div class="sk-toggleable__content"><pre>FunctionTransformer(func=&lt;function transform_ts_data_into_features_and_target at 0x11f925ca0&gt;,
                kw_args={&#x27;concat_Xy&#x27;: True, &#x27;datetime_col&#x27;: &#x27;date&#x27;,
                         &#x27;entity_col&#x27;: &#x27;city&#x27;, &#x27;n_features&#x27;: 3,
                         &#x27;n_targets&#x27;: 1, &#x27;step_name&#x27;: &#x27;month&#x27;,
                         &#x27;step_size&#x27;: 1, &#x27;value_col&#x27;: &#x27;sales&#x27;})</pre></div></div></div></div></div></div></div>

Xy_df = ts_data_to_features_and_target_pipeline.fit_transform(df)
Xy_df

100%|██████████| 3/3 [00:00<00:00, 715.47it/s]
100%|██████████| 3/3 [00:00<00:00, 184.12it/s]


<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>




sales_previous_3_month
sales_previous_2_month
sales_previous_1_month
date
city
target_sales_next_month




0
4944.0
3435.0
4543.0
2020-04-01
FOR
3879.0


1
3435.0
4543.0
3879.0
2020-05-01
FOR
2601.0


2
4543.0
3879.0
2601.0
2020-06-01
FOR
2922.0


3
3879.0
2601.0
2922.0
2020-07-01
FOR
4542.0


4
2601.0
2922.0
4542.0
2020-08-01
FOR
1338.0


...
...
...
...
...
...
...


91
4197.0
4141.0
2899.0
2022-07-01
RJ
0.0


92
4141.0
2899.0
0.0
2022-08-01
RJ
0.0


93
2899.0
0.0
0.0
2022-09-01
RJ
1856.0


94
0.0
0.0
1856.0
2022-10-01
RJ
4804.0


95
0.0
1856.0
4804.0
2022-11-01
RJ
1764.0



96 rows × 6 columns

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Jun 3, 2023

0.0.3

May 31, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ts2ml-1.0.1.tar.gz (17.9 kB view details)

Uploaded Jun 3, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ts2ml-1.0.1-py3-none-any.whl (12.9 kB view details)

Uploaded Jun 3, 2023 Python 3

File details

Details for the file ts2ml-1.0.1.tar.gz.

File metadata

Download URL: ts2ml-1.0.1.tar.gz
Upload date: Jun 3, 2023
Size: 17.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ts2ml-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e31eee6d3911b723b4f3d123fef10192e6ad721a3ff5bd8093716d8b151f7d14`
MD5	`f5463cd0a2186f9b0e3d55f846423255`
BLAKE2b-256	`56c046aa04962b31d0eb5282df70a68fbde9a29a3b5f9bf893ddf9d96f3c75a2`

See more details on using hashes here.

File details

Details for the file ts2ml-1.0.1-py3-none-any.whl.

File metadata

Download URL: ts2ml-1.0.1-py3-none-any.whl
Upload date: Jun 3, 2023
Size: 12.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ts2ml-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`90738dfa76b54b8abaed3b21d2ca9cae8e7c86859fc902868bfcffd6f409a3aa`
MD5	`e0c80ae71009a1fd69649df4523e452f`
BLAKE2b-256	`83a0290fa925a32633aaa7f2f080770dac0d04eeb63722cec89fc5b127c48845`

See more details on using hashes here.

ts2ml 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ts2ml

Install

How to use

Another Example

Embedding on Sklearn Pipelines

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes