Skip to main content

Tools to Transform a Time Series into Features and Target a.k.a Supervised Learning

Project description

ts2ml

Install

pip install ts2ml

How to use

import pandas as pd
from ts2ml.core import add_missing_slots
df = pd.DataFrame({
    'pickup_hour': ['2022-01-01 00:00:00', '2022-01-01 01:00:00', '2022-01-01 03:00:00', '2022-01-01 01:00:00', '2022-01-01 02:00:00', '2022-01-01 05:00:00'],
    'pickup_location_id': [1, 1, 1, 2, 2, 2],
    'rides': [2, 3, 1, 1, 2, 1]
})
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
pickup_hour pickup_location_id rides
0 2022-01-01 00:00:00 1 2
1 2022-01-01 01:00:00 1 3
2 2022-01-01 03:00:00 1 1
3 2022-01-01 01:00:00 2 1
4 2022-01-01 02:00:00 2 2
5 2022-01-01 05:00:00 2 1
add_missing_slots(df, datetime_col='pickup_hour', entity_col='pickup_location_id', value_col='rides', freq='H')
100%|██████████| 2/2 [00:00<00:00, 352.17it/s]
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
pickup_hour pickup_location_id rides
0 2022-01-01 00:00:00 1 2
1 2022-01-01 01:00:00 1 3
2 2022-01-01 02:00:00 1 0
3 2022-01-01 03:00:00 1 1
4 2022-01-01 04:00:00 1 0
5 2022-01-01 05:00:00 1 0
6 2022-01-01 00:00:00 2 0
7 2022-01-01 01:00:00 2 1
8 2022-01-01 02:00:00 2 2
9 2022-01-01 03:00:00 2 0
10 2022-01-01 04:00:00 2 0
11 2022-01-01 05:00:00 2 1

Another Example

Montly spaced time series

import pandas as pd
import numpy as np

# Generate timestamp index with monthly frequency
date_rng = pd.date_range(start='1/1/2020', end='12/1/2022', freq='MS')

# Create list of city codes
cities = ['FOR', 'SP', 'RJ']

# Create dataframe with random sales data for each city on each month
df = pd.DataFrame({
    'timestamp': date_rng,
    'city': np.repeat(cities, len(date_rng)//len(cities)),
    'sales': np.random.randint(1000, 5000, size=len(date_rng))
})
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
timestamp city sales
0 2020-01-01 FOR 4216
1 2020-02-01 FOR 4309
2 2020-03-01 FOR 3639
3 2020-04-01 FOR 3685
4 2020-05-01 FOR 4481
5 2020-06-01 FOR 4133
6 2020-07-01 FOR 3504
7 2020-08-01 FOR 3957
8 2020-09-01 FOR 2781
9 2020-10-01 FOR 2996
10 2020-11-01 FOR 3963
11 2020-12-01 FOR 2381
12 2021-01-01 SP 1489
13 2021-02-01 SP 3863
14 2021-03-01 SP 4005
15 2021-04-01 SP 3612
16 2021-05-01 SP 4823
17 2021-06-01 SP 1687
18 2021-07-01 SP 3688
19 2021-08-01 SP 1729
20 2021-09-01 SP 1496
21 2021-10-01 SP 2460
22 2021-11-01 SP 1448
23 2021-12-01 SP 3174
24 2022-01-01 RJ 1201
25 2022-02-01 RJ 3210
26 2022-03-01 RJ 4580
27 2022-04-01 RJ 1318
28 2022-05-01 RJ 4607
29 2022-06-01 RJ 1565
30 2022-07-01 RJ 2935
31 2022-08-01 RJ 3924
32 2022-09-01 RJ 1577
33 2022-10-01 RJ 4395
34 2022-11-01 RJ 1867
35 2022-12-01 RJ 2739
df.groupby('city').agg({'timestamp': ['min', 'max']})
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; } </style>
timestamp
min max
city
FOR 2020-01-01 2020-12-01
RJ 2022-01-01 2022-12-01
SP 2021-01-01 2021-12-01

FOR city only have data for 2020 year, RJ only for 2022 and SP only for 2021. Let’s also simulate more missing slots between the years.

# Generate random indices to drop
drop_indices = np.random.choice(df.index, size=int(len(df)*0.2), replace=False)

# Drop selected rows from dataframe
df = df.drop(drop_indices)
df.reset_index(drop=True, inplace=True)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
timestamp city sales
0 2020-01-01 FOR 4216
1 2020-03-01 FOR 3639
2 2020-05-01 FOR 4481
3 2020-06-01 FOR 4133
4 2020-07-01 FOR 3504
5 2020-08-01 FOR 3957
6 2020-09-01 FOR 2781
7 2020-10-01 FOR 2996
8 2020-11-01 FOR 3963
9 2020-12-01 FOR 2381
10 2021-01-01 SP 1489
11 2021-02-01 SP 3863
12 2021-07-01 SP 3688
13 2021-08-01 SP 1729
14 2021-10-01 SP 2460
15 2022-01-01 RJ 1201
16 2022-03-01 RJ 4580
17 2022-04-01 RJ 1318
18 2022-05-01 RJ 4607
19 2022-07-01 RJ 2935
20 2022-08-01 RJ 3924
21 2022-09-01 RJ 1577
22 2022-11-01 RJ 1867
23 2022-12-01 RJ 2739

Now lets fill the missing slots with zero values. The function will complete the missing slots with zeros:

df_full = add_missing_slots(df, datetime_col='timestamp', entity_col='city', value_col='sales', freq='MS')
df_full
100%|██████████| 3/3 [00:00<00:00, 844.15it/s]
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
timestamp city sales
0 2020-01-01 FOR 4216
1 2020-02-01 FOR 0
2 2020-03-01 FOR 3639
3 2020-04-01 FOR 0
4 2020-05-01 FOR 4481
... ... ... ...
103 2022-08-01 RJ 3924
104 2022-09-01 RJ 1577
105 2022-10-01 RJ 0
106 2022-11-01 RJ 1867
107 2022-12-01 RJ 2739

108 rows × 3 columns

df_full.groupby('city').agg({'timestamp': ['min', 'max']})
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; } </style>
timestamp
min max
city
FOR 2020-01-01 2022-12-01
RJ 2020-01-01 2022-12-01
SP 2020-01-01 2022-12-01

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ts2ml-0.0.3.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ts2ml-0.0.3-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file ts2ml-0.0.3.tar.gz.

File metadata

  • Download URL: ts2ml-0.0.3.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ts2ml-0.0.3.tar.gz
Algorithm Hash digest
SHA256 f79308388bc8e0f799de2521a916524b5110e9a3336c6434fff0763fe65e5850
MD5 eeb9ca7d4ab9a47b937abd6da42d5cbf
BLAKE2b-256 cc1d64b9d0f36027eafe80d3f5fbff7088b43b1e58e3e69de8c1bb0910f7a4c4

See more details on using hashes here.

File details

Details for the file ts2ml-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: ts2ml-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ts2ml-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0248a9951f38a6d0cd037f6967fac6ab0c4d5c5ce0a79ba55b0e098a6ff03f02
MD5 99f974d46c7e124d3f0fb775360bbcc8
BLAKE2b-256 7fa9e7f0a5ff960a64379b0258dac6b143518e852952f31462c3ec69ebe2784c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page