A package linking symbolic representation with sklearn for time series prediction

Project description

slearn

A package linking symbolic representation with sklearn for time series prediction.

Symbolic representations of time series have proved their usefulness in the field of time series motif discovery, clustering, classification, forecasting, anomaly detection, etc. Symbolic time series representation method do not only reduce the dimensionality of time series but also speedup the downstream time series task. How to appropriately deploy machine learning algorithm on the level of symbols instead of raw time series poses a challenge to the interest of applications. To boost the development of research community on symbolic representation, we develop this Python library to simplify the process of machine learning algorithm practice on symbolic representation.

Now let's get started!

Install the slearn package simply by

$ pip install slearn

Support Classifiers	Parameter call
Multi-layer Perceptron	'MLPClassifier'
K-Nearest Neighbors	'KNeighborsClassifier'
Gaussian Naive Bayes	'GaussianNB'
Decision Tree	'DecisionTreeClassifier'
Support Vector Classification	'SVC'
Radial-basis Function Kernel	'RBF'
Logistic Regression	'LogisticRegression'
Quadratic Discriminant Analysis	'QuadraticDiscriminantAnalysis'
AdaBoost classifier	'AdaBoostClassifier'
Random Forest	'RandomForestClassifier'
LightGBM	'LGBM'

Symbolic machine learning prediction

Import the package

>>> from slearn import symbolicML

We can predict any symbolic sequence by choosing the classifiers available in scikit-learn.

>>> string = 'aaaabbbccd'
>>> sbml = symbolicML(classifier_name="MLPClassifier", ws=3, random_seed=0, verbose=0)
>>> x, y = sbml._encoding(string)
>>> pred = sbml.forecasting(x, y, step=5, hidden_layer_sizes=(10,10), learning_rate_init=0.1)
>>> print(pred)
['d', 'b', 'a', 'b', 'b'] # the prediction

Also, you can use it by passing into parameters of dictionary form

>>> string = 'aaaabbbccd'
>>> sbml = symbolicML(classifier_name="MLPClassifier", ws=3, random_seed=0, verbose=0)
>>> x, y = sbml._encoding(string)
>>> params = {'hidden_layer_sizes':(10,10), 'activation':'relu', 'learning_rate_init':0.1}
>>> pred = sbml.forecasting(x, y, step=5, **params)
>>> print(pred)
['d', 'b', 'a', 'b', 'b'] # the prediction

The parameters for the chosen classifier follow the same as the scikit-learn library, so just ensure that parameters are existing in the scikit-learn classifiers.

Prediction with symbolic representation

Load libraries.

>>> import pandas as pd
>>> import numpy as np
>>> import seaborn as sns
>>> import matplotlib.pyplot as plt
>>> from slearn import *

>>> time_series = pd.read_csv("Amazon.csv") # load the required dataset, here we use Amazon stock daily close price.
>>> ts = time_series.Close.values

Set the number of symbols you would like to predict.

>>> step = 50

You can select the available classifiers and symbolic representation method (currently we support SAX and ABBA) for prediction. Similarly, the parameters of the chosen classifier follow the same as the scikit-learn library. We usually deploy ABBA symbolic representation, since it achieves better forecasting against SAX.

Use Gaussian Naive Bayes method:

>>> sl = slearn(series=ts, method='fABBA', 
            ws=3, step=step,
            tol=0.01, alpha=0.2, 
            form='numeric', classifier_name="GaussianNB",
            random_seed=1, verbose=1)
>>> sklearn_params = {'var_smoothing':0.001}
>>> abba_nb_pred = sl.predict(**sklearn_params)

Use neural network models method:

>>> sl = slearn(series=ts, method='fABBA',
            ws=3, step=step,
            tol=0.01, alpha=0.2, 
            form='numeric', classifier_name="MLPClassifier",
            random_seed=1, verbose=1)
>>> sklearn_params = {'hidden_layer_sizes':(20,80), 'learning_rate_init':0.1}
>>> abba_nn_pred = sl.predict(**sklearn_params)

We can plot the prediction and compare the results,

>>> min_len = np.min([len(abba_nb_pred), len(abba_nn_pred)])
>>> sns.set_theme(style="whitegrid")
>>> plt.figure(figsize=(25, 9))
>>> sns.set(font_scale=2, style="whitegrid")
>>> sns.lineplot(x=np.arange(0, len(ts)), y= ts, color='c', linewidth=6, label='Time series')
>>> sns.lineplot(x=np.arange(len(ts), len(ts)+min_len), y=abba_nb_pred[:min_len], color='tomato', linewidth=6, label='Prediction (ABBA - GaussianNB)')
>>> sns.lineplot(x=np.arange(len(ts), len(ts)+min_len), y=abba_nn_pred[:min_len], color='darkgreen', linewidth=6, label='Prediction (ABBA - MLPClassifier)')
>>> plt.tight_layout()
>>> plt.tick_params(axis='both', labelsize=25)
>>> plt.show()

original image

Flexible symbolic sequence generator

slearn library also contains functions for the generation of strings of tunable complexity using the LZW compressing method as base to approximate Kolmogorov complexity.

>>> from slearn import *
>>> df_strings = LZWStringLibrary(symbols=3, complexity=[3, 9])
>>> df_strings

Processing: 2 of 2

	nr_symbols	LZW_complexity	length	string
0	3	3	3	BCA
1	3	9	12	ABCBBCBBABCC

>>> df_iters = pd.DataFrame()
>>> for i, string in enumerate(df_strings['string']):
>>>     kwargs = df_strings.iloc[i,:-1].to_dict()
>>>     seed_string = df_strings.iloc[i,-1]
>>>     df_iter = RNN_Iteration(seed_string, iterations=2, architecture='LSTM', **kwargs)
>>>     df_iter.loc[:, kwargs.keys()] = kwargs.values()
>>>     df_iters = df_iters.append(df_iter)
>>> df_iter.reset_index(drop=True, inplace=True)

...

>>> df_iters.reset_index(drop=True, inplace=True)
>>> df_iters

	jw	dl	total_epochs	seq_test	seq_forecast	total_time	nr_symbols	LZW_complexity	length
0	1.000000	1.0	12	ABCABCABCA	ABCABCABCA	2.685486	3	3	3
1	1.000000	1.0	14	ABCABCABCA	ABCABCABCA	2.436733	3	3	3
2	0.657143	0.5	36	CBBCBBABCC	AABCABCABC	3.352712	3	9	12
3	0.704762	0.4	36	CBBCBBABCC	ABCBABBBBB	3.811584	3	9	12

Software Contributors

Roberto Cahuantzi
Xinye Chen 
Stefan Güttel

Equal contributions, ordered by the last name.

Project details

Release history Release notifications | RSS feed

0.2.9

Oct 17, 2025

0.2.8

Jun 28, 2025

0.2.7

Jun 28, 2025

0.2.6

May 12, 2025

0.2.5

May 9, 2022

0.2.4

Mar 27, 2022

0.2.3

Mar 10, 2022

0.2.2

Feb 26, 2022

0.2.1

Feb 26, 2022

0.2.0

Feb 26, 2022

0.1.9

Feb 26, 2022

0.1.8

Feb 26, 2022

0.1.7

Feb 26, 2022

0.1.6

Feb 26, 2022

0.1.5

Feb 26, 2022

0.1.4

Feb 26, 2022

0.1.3

Feb 25, 2022

0.1.2

Feb 25, 2022

0.1.1

Feb 25, 2022

0.0.9

Jan 11, 2022

This version

0.0.8

Nov 30, 2021

0.0.7

Nov 30, 2021

0.0.6

Nov 30, 2021

0.0.5

Nov 30, 2021

0.0.4

Nov 29, 2021

0.0.3

Nov 26, 2021

0.0.2

Nov 25, 2021

0.0.1

Nov 23, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slearn-0.0.8.tar.gz (16.7 kB view details)

Uploaded Nov 30, 2021 Source

File details

Details for the file slearn-0.0.8.tar.gz.

File metadata

Download URL: slearn-0.0.8.tar.gz
Upload date: Nov 30, 2021
Size: 16.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for slearn-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`02fd9a3f0cd61cda6167df446334882b65e578688b2813f45a5e7ca91f47564e`
MD5	`c662414aa71cbdea92fd3657ace71ef2`
BLAKE2b-256	`498990421fa2e5ab0b218b58a430c1c6c76cf8538ddb9ef8ea27156fe55ea566`

See more details on using hashes here.

slearn 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

slearn

Symbolic machine learning prediction

Prediction with symbolic representation

Flexible symbolic sequence generator

Software Contributors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes