Tools to Transform a Time Series into Features and Target a.k.a Supervised Learning
Project description
ts2ml
Install
pip install ts2ml
How to use
import pandas as pd
from ts2ml.core import add_missing_slots
df = pd.DataFrame({
'pickup_hour': ['2022-01-01 00:00:00', '2022-01-01 01:00:00', '2022-01-01 03:00:00', '2022-01-01 01:00:00', '2022-01-01 02:00:00', '2022-01-01 05:00:00'],
'pickup_location_id': [1, 1, 1, 2, 2, 2],
'rides': [2, 3, 1, 1, 2, 1]
})
df
| pickup_hour | pickup_location_id | rides | |
|---|---|---|---|
| 0 | 2022-01-01 00:00:00 | 1 | 2 |
| 1 | 2022-01-01 01:00:00 | 1 | 3 |
| 2 | 2022-01-01 03:00:00 | 1 | 1 |
| 3 | 2022-01-01 01:00:00 | 2 | 1 |
| 4 | 2022-01-01 02:00:00 | 2 | 2 |
| 5 | 2022-01-01 05:00:00 | 2 | 1 |
add_missing_slots(df, datetime_col='pickup_hour', entity_col='pickup_location_id', value_col='rides', freq='H')
100%|██████████| 2/2 [00:00<00:00, 352.17it/s]
| pickup_hour | pickup_location_id | rides | |
|---|---|---|---|
| 0 | 2022-01-01 00:00:00 | 1 | 2 |
| 1 | 2022-01-01 01:00:00 | 1 | 3 |
| 2 | 2022-01-01 02:00:00 | 1 | 0 |
| 3 | 2022-01-01 03:00:00 | 1 | 1 |
| 4 | 2022-01-01 04:00:00 | 1 | 0 |
| 5 | 2022-01-01 05:00:00 | 1 | 0 |
| 6 | 2022-01-01 00:00:00 | 2 | 0 |
| 7 | 2022-01-01 01:00:00 | 2 | 1 |
| 8 | 2022-01-01 02:00:00 | 2 | 2 |
| 9 | 2022-01-01 03:00:00 | 2 | 0 |
| 10 | 2022-01-01 04:00:00 | 2 | 0 |
| 11 | 2022-01-01 05:00:00 | 2 | 1 |
Another Example
Montly spaced time series
import pandas as pd
import numpy as np
# Generate timestamp index with monthly frequency
date_rng = pd.date_range(start='1/1/2020', end='12/1/2022', freq='MS')
# Create list of city codes
cities = ['FOR', 'SP', 'RJ']
# Create dataframe with random sales data for each city on each month
df = pd.DataFrame({
'timestamp': date_rng,
'city': np.repeat(cities, len(date_rng)//len(cities)),
'sales': np.random.randint(1000, 5000, size=len(date_rng))
})
df
| timestamp | city | sales | |
|---|---|---|---|
| 0 | 2020-01-01 | FOR | 4216 |
| 1 | 2020-02-01 | FOR | 4309 |
| 2 | 2020-03-01 | FOR | 3639 |
| 3 | 2020-04-01 | FOR | 3685 |
| 4 | 2020-05-01 | FOR | 4481 |
| 5 | 2020-06-01 | FOR | 4133 |
| 6 | 2020-07-01 | FOR | 3504 |
| 7 | 2020-08-01 | FOR | 3957 |
| 8 | 2020-09-01 | FOR | 2781 |
| 9 | 2020-10-01 | FOR | 2996 |
| 10 | 2020-11-01 | FOR | 3963 |
| 11 | 2020-12-01 | FOR | 2381 |
| 12 | 2021-01-01 | SP | 1489 |
| 13 | 2021-02-01 | SP | 3863 |
| 14 | 2021-03-01 | SP | 4005 |
| 15 | 2021-04-01 | SP | 3612 |
| 16 | 2021-05-01 | SP | 4823 |
| 17 | 2021-06-01 | SP | 1687 |
| 18 | 2021-07-01 | SP | 3688 |
| 19 | 2021-08-01 | SP | 1729 |
| 20 | 2021-09-01 | SP | 1496 |
| 21 | 2021-10-01 | SP | 2460 |
| 22 | 2021-11-01 | SP | 1448 |
| 23 | 2021-12-01 | SP | 3174 |
| 24 | 2022-01-01 | RJ | 1201 |
| 25 | 2022-02-01 | RJ | 3210 |
| 26 | 2022-03-01 | RJ | 4580 |
| 27 | 2022-04-01 | RJ | 1318 |
| 28 | 2022-05-01 | RJ | 4607 |
| 29 | 2022-06-01 | RJ | 1565 |
| 30 | 2022-07-01 | RJ | 2935 |
| 31 | 2022-08-01 | RJ | 3924 |
| 32 | 2022-09-01 | RJ | 1577 |
| 33 | 2022-10-01 | RJ | 4395 |
| 34 | 2022-11-01 | RJ | 1867 |
| 35 | 2022-12-01 | RJ | 2739 |
df.groupby('city').agg({'timestamp': ['min', 'max']})
| timestamp | ||
|---|---|---|
| min | max | |
| city | ||
| FOR | 2020-01-01 | 2020-12-01 |
| RJ | 2022-01-01 | 2022-12-01 |
| SP | 2021-01-01 | 2021-12-01 |
FOR city only have data for 2020 year, RJ only for 2022 and SP only for 2021. Let’s also simulate more missing slots between the years.
# Generate random indices to drop
drop_indices = np.random.choice(df.index, size=int(len(df)*0.2), replace=False)
# Drop selected rows from dataframe
df = df.drop(drop_indices)
df.reset_index(drop=True, inplace=True)
df
| timestamp | city | sales | |
|---|---|---|---|
| 0 | 2020-01-01 | FOR | 4216 |
| 1 | 2020-03-01 | FOR | 3639 |
| 2 | 2020-05-01 | FOR | 4481 |
| 3 | 2020-06-01 | FOR | 4133 |
| 4 | 2020-07-01 | FOR | 3504 |
| 5 | 2020-08-01 | FOR | 3957 |
| 6 | 2020-09-01 | FOR | 2781 |
| 7 | 2020-10-01 | FOR | 2996 |
| 8 | 2020-11-01 | FOR | 3963 |
| 9 | 2020-12-01 | FOR | 2381 |
| 10 | 2021-01-01 | SP | 1489 |
| 11 | 2021-02-01 | SP | 3863 |
| 12 | 2021-07-01 | SP | 3688 |
| 13 | 2021-08-01 | SP | 1729 |
| 14 | 2021-10-01 | SP | 2460 |
| 15 | 2022-01-01 | RJ | 1201 |
| 16 | 2022-03-01 | RJ | 4580 |
| 17 | 2022-04-01 | RJ | 1318 |
| 18 | 2022-05-01 | RJ | 4607 |
| 19 | 2022-07-01 | RJ | 2935 |
| 20 | 2022-08-01 | RJ | 3924 |
| 21 | 2022-09-01 | RJ | 1577 |
| 22 | 2022-11-01 | RJ | 1867 |
| 23 | 2022-12-01 | RJ | 2739 |
Now lets fill the missing slots with zero values. The function will complete the missing slots with zeros:
df_full = add_missing_slots(df, datetime_col='timestamp', entity_col='city', value_col='sales', freq='MS')
df_full
100%|██████████| 3/3 [00:00<00:00, 844.15it/s]
| timestamp | city | sales | |
|---|---|---|---|
| 0 | 2020-01-01 | FOR | 4216 |
| 1 | 2020-02-01 | FOR | 0 |
| 2 | 2020-03-01 | FOR | 3639 |
| 3 | 2020-04-01 | FOR | 0 |
| 4 | 2020-05-01 | FOR | 4481 |
| ... | ... | ... | ... |
| 103 | 2022-08-01 | RJ | 3924 |
| 104 | 2022-09-01 | RJ | 1577 |
| 105 | 2022-10-01 | RJ | 0 |
| 106 | 2022-11-01 | RJ | 1867 |
| 107 | 2022-12-01 | RJ | 2739 |
108 rows × 3 columns
df_full.groupby('city').agg({'timestamp': ['min', 'max']})
| timestamp | ||
|---|---|---|
| min | max | |
| city | ||
| FOR | 2020-01-01 | 2022-12-01 |
| RJ | 2020-01-01 | 2022-12-01 |
| SP | 2020-01-01 | 2022-12-01 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ts2ml-0.0.3.tar.gz.
File metadata
- Download URL: ts2ml-0.0.3.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f79308388bc8e0f799de2521a916524b5110e9a3336c6434fff0763fe65e5850
|
|
| MD5 |
eeb9ca7d4ab9a47b937abd6da42d5cbf
|
|
| BLAKE2b-256 |
cc1d64b9d0f36027eafe80d3f5fbff7088b43b1e58e3e69de8c1bb0910f7a4c4
|
File details
Details for the file ts2ml-0.0.3-py3-none-any.whl.
File metadata
- Download URL: ts2ml-0.0.3-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0248a9951f38a6d0cd037f6967fac6ab0c4d5c5ce0a79ba55b0e098a6ff03f02
|
|
| MD5 |
99f974d46c7e124d3f0fb775360bbcc8
|
|
| BLAKE2b-256 |
7fa9e7f0a5ff960a64379b0258dac6b143518e852952f31462c3ec69ebe2784c
|