Tools to Transform a Time Series into Features and Target a.k.a Supervised Learning

These details have not been verified by PyPI

Project links

Homepage

Project description

ts2ml

Install

pip install ts2ml

How to use

import pandas as pd
from ts2ml.core import add_missing_slots

df = pd.DataFrame({
    'pickup_hour': ['2022-01-01 00:00:00', '2022-01-01 01:00:00', '2022-01-01 03:00:00', '2022-01-01 01:00:00', '2022-01-01 02:00:00', '2022-01-01 05:00:00'],
    'pickup_location_id': [1, 1, 1, 2, 2, 2],
    'rides': [2, 3, 1, 1, 2, 1]
})
df

	pickup_hour	pickup_location_id	rides
0	2022-01-01 00:00:00	1	2
1	2022-01-01 01:00:00	1	3
2	2022-01-01 03:00:00	1	1
3	2022-01-01 01:00:00	2	1
4	2022-01-01 02:00:00	2	2
5	2022-01-01 05:00:00	2	1

add_missing_slots(df, datetime_col='pickup_hour', entity_col='pickup_location_id', value_col='rides', freq='H')

100%|██████████| 2/2 [00:00<00:00, 352.17it/s]

	pickup_hour	pickup_location_id	rides
0	2022-01-01 00:00:00	1	2
1	2022-01-01 01:00:00	1	3
2	2022-01-01 02:00:00	1	0
3	2022-01-01 03:00:00	1	1
4	2022-01-01 04:00:00	1	0
5	2022-01-01 05:00:00	1	0
6	2022-01-01 00:00:00	2	0
7	2022-01-01 01:00:00	2	1
8	2022-01-01 02:00:00	2	2
9	2022-01-01 03:00:00	2	0
10	2022-01-01 04:00:00	2	0
11	2022-01-01 05:00:00	2	1

Another Example

Montly spaced time series

import pandas as pd
import numpy as np

# Generate timestamp index with monthly frequency
date_rng = pd.date_range(start='1/1/2020', end='12/1/2022', freq='MS')

# Create list of city codes
cities = ['FOR', 'SP', 'RJ']

# Create dataframe with random sales data for each city on each month
df = pd.DataFrame({
    'timestamp': date_rng,
    'city': np.repeat(cities, len(date_rng)//len(cities)),
    'sales': np.random.randint(1000, 5000, size=len(date_rng))
})
df

	timestamp	city	sales
0	2020-01-01	FOR	4216
1	2020-02-01	FOR	4309
2	2020-03-01	FOR	3639
3	2020-04-01	FOR	3685
4	2020-05-01	FOR	4481
5	2020-06-01	FOR	4133
6	2020-07-01	FOR	3504
7	2020-08-01	FOR	3957
8	2020-09-01	FOR	2781
9	2020-10-01	FOR	2996
10	2020-11-01	FOR	3963
11	2020-12-01	FOR	2381
12	2021-01-01	SP	1489
13	2021-02-01	SP	3863
14	2021-03-01	SP	4005
15	2021-04-01	SP	3612
16	2021-05-01	SP	4823
17	2021-06-01	SP	1687
18	2021-07-01	SP	3688
19	2021-08-01	SP	1729
20	2021-09-01	SP	1496
21	2021-10-01	SP	2460
22	2021-11-01	SP	1448
23	2021-12-01	SP	3174
24	2022-01-01	RJ	1201
25	2022-02-01	RJ	3210
26	2022-03-01	RJ	4580
27	2022-04-01	RJ	1318
28	2022-05-01	RJ	4607
29	2022-06-01	RJ	1565
30	2022-07-01	RJ	2935
31	2022-08-01	RJ	3924
32	2022-09-01	RJ	1577
33	2022-10-01	RJ	4395
34	2022-11-01	RJ	1867
35	2022-12-01	RJ	2739

df.groupby('city').agg({'timestamp': ['min', 'max']})

	timestamp
	min	max
city
FOR	2020-01-01	2020-12-01
RJ	2022-01-01	2022-12-01
SP	2021-01-01	2021-12-01

FOR city only have data for 2020 year, RJ only for 2022 and SP only for 2021. Let’s also simulate more missing slots between the years.

# Generate random indices to drop
drop_indices = np.random.choice(df.index, size=int(len(df)*0.2), replace=False)

# Drop selected rows from dataframe
df = df.drop(drop_indices)
df.reset_index(drop=True, inplace=True)
df

	timestamp	city	sales
0	2020-01-01	FOR	4216
1	2020-03-01	FOR	3639
2	2020-05-01	FOR	4481
3	2020-06-01	FOR	4133
4	2020-07-01	FOR	3504
5	2020-08-01	FOR	3957
6	2020-09-01	FOR	2781
7	2020-10-01	FOR	2996
8	2020-11-01	FOR	3963
9	2020-12-01	FOR	2381
10	2021-01-01	SP	1489
11	2021-02-01	SP	3863
12	2021-07-01	SP	3688
13	2021-08-01	SP	1729
14	2021-10-01	SP	2460
15	2022-01-01	RJ	1201
16	2022-03-01	RJ	4580
17	2022-04-01	RJ	1318
18	2022-05-01	RJ	4607
19	2022-07-01	RJ	2935
20	2022-08-01	RJ	3924
21	2022-09-01	RJ	1577
22	2022-11-01	RJ	1867
23	2022-12-01	RJ	2739

Now lets fill the missing slots with zero values. The function will complete the missing slots with zeros:

df_full = add_missing_slots(df, datetime_col='timestamp', entity_col='city', value_col='sales', freq='MS')
df_full

100%|██████████| 3/3 [00:00<00:00, 844.15it/s]

	timestamp	city	sales
0	2020-01-01	FOR	4216
1	2020-02-01	FOR	0
2	2020-03-01	FOR	3639
3	2020-04-01	FOR	0
4	2020-05-01	FOR	4481
...	...	...	...
103	2022-08-01	RJ	3924
104	2022-09-01	RJ	1577
105	2022-10-01	RJ	0
106	2022-11-01	RJ	1867
107	2022-12-01	RJ	2739

108 rows × 3 columns

df_full.groupby('city').agg({'timestamp': ['min', 'max']})

	timestamp
	min	max
city
FOR	2020-01-01	2022-12-01
RJ	2020-01-01	2022-12-01
SP	2020-01-01	2022-12-01

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.1

Jun 3, 2023

This version

0.0.3

May 31, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ts2ml-0.0.3.tar.gz (10.0 kB view details)

Uploaded May 31, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ts2ml-0.0.3-py3-none-any.whl (9.1 kB view details)

Uploaded May 31, 2023 Python 3

File details

Details for the file ts2ml-0.0.3.tar.gz.

File metadata

Download URL: ts2ml-0.0.3.tar.gz
Upload date: May 31, 2023
Size: 10.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ts2ml-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`f79308388bc8e0f799de2521a916524b5110e9a3336c6434fff0763fe65e5850`
MD5	`eeb9ca7d4ab9a47b937abd6da42d5cbf`
BLAKE2b-256	`cc1d64b9d0f36027eafe80d3f5fbff7088b43b1e58e3e69de8c1bb0910f7a4c4`

See more details on using hashes here.

File details

Details for the file ts2ml-0.0.3-py3-none-any.whl.

File metadata

Download URL: ts2ml-0.0.3-py3-none-any.whl
Upload date: May 31, 2023
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ts2ml-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0248a9951f38a6d0cd037f6967fac6ab0c4d5c5ce0a79ba55b0e098a6ff03f02`
MD5	`99f974d46c7e124d3f0fb775360bbcc8`
BLAKE2b-256	`7fa9e7f0a5ff960a64379b0258dac6b143518e852952f31462c3ec69ebe2784c`

See more details on using hashes here.

ts2ml 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ts2ml

Install

How to use

Another Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes