Skip to main content

Automated machine learning framework for time series analysis

Project description

Fedot Industrial logo

Acknowledgement to SAI Acknowledgement to ITMO

Issues Stars Python 3.8 License Documentation Status Support rus

Instead of using complex and resource-demanding deep learning techniques, which could be considered state-of-the-art solutions, we propose using a combination of feature extractors with an ensemble of lightweight models obtained by the algorithmic kernel of the AutoML framework FEDOT.

The application fields of the framework are the following:

  • Classification (time series or image)

For this purpose we introduce four feature generators:

All generators

Once the feature generation process is complete, you can apply FEDOT’s evolutionary algorithm to find the best model for the classification task.

  • Anomaly detection (time series or image)

  • Change point detection (only time series)

  • Object detection (only image)

Usage

FEDOT.Industrial provides a high-level API that allows you to use its capabilities in a simple way.

Classification

To conduct time series classification you need to set the experiment configuration via a dictionary, then create an instance of the Industrial class, and call its run_experiment method:

from core.api.main import FedotIndustrial

industrial = FedotIndustrial(task='ts_classification',
                             dataset=dataset_name,
                             strategy='statistical',
                             use_cache=True,
                             timeout=1,
                             n_jobs=2,
                             window_sizes='auto',
                             logging_level=20,
                             output_folder=None)

You can then load the data and run the experiment:

train_data, test_data, _ = industrial.reader.read(dataset_name='ItalyPowerDemand')

model = industrial.fit(train_features=train_data[0], train_target=train_data[1])
labels = industrial.predict(test_features=test_data[0])
metric = industrial.get_metrics(target=test_data[1], metric_names=['f1', 'roc_auc'])

The config contains the following parameters:

  • task - type of task to be solved (ts_classification)

  • dataset - name of the data set for the experiment

  • strategy - the way to solve the problem: a specific generator or in fedot_preset mode

  • use_cache - a flag to use caching of extracted features

  • timeout - maximum amount of time to compile a pipeline for the classification

  • n_jobs - number of processes for parallel execution

  • window_sizes - window sizes for window generators

  • logging_level - logging level

  • output_folder - path to folder to save results

Datasets for classification should be stored in the data directory and divided into train and test sets with .tsv extension. So the folder name in the data directory should be set to the name of the dataset that you want to use in the experiment. In case there is no data in the local folder, the DataLoader class will try to load data from the UCR archive.

Possible feature generators which could be specified in the configuration are quantile, wavelet, recurrence и topological.

It is also possible to ensemble several feature generators. It could be done by setting the strategy field of the config, where you need to specify the list of feature generators, to the following value:

'ensemble: topological wavelet quantile'

Feature caching

To speed up the experiment, you can cache the features produced by the feature generators. If use_cache bool flag in config is True, then every feature space generated during the experiment is cached into the corresponding folder.

The next time when the same feature space is requested, the hash is calculated again and the corresponding feature space is loaded from the cache which is much faster than generating it from scratch.

Stay tuned!

Project structure

The latest stable release of FEDOT.Industrial is in the main branch.

The repository includes the following directories:

  • The api folder contains the main interface classes and scripts

  • Package core contains the main classes and scripts

  • Package examples includes several how-to-use-cases where you can start to discover how the framework works

  • All unit and integration tests are in the test directory

  • The sources of the documentation are in docs

Current R&D and future plans

– Implement feature space caching for feature generators (DONE)

– Development of model containerization module

– Development of meta-knowledge storage for data obtained from the experiments

– Research on time series clusterization

Documentation

A comprehensive documentation is available at readthedocs.

Supported by

The study is supported by the Research Center Strong Artificial Intelligence in Industry of ITMO University as part of the plan of the center’s program: Development of AutoML framework for industrial tasks.

Citation

Here we will provide a list of citations for the project as soon as the articles are published.

@article{REVIN2023110483,
title = {Automated machine learning approach for time series classification pipelines using evolutionary optimisation},
journal = {Knowledge-Based Systems},
pages = {110483},
year = {2023},
issn = {0950-7051},
doi = {https://doi.org/10.1016/j.knosys.2023.110483},
url = {https://www.sciencedirect.com/science/article/pii/S0950705123002332},
author = {Ilia Revin and Vadim A. Potemkin and Nikita R. Balabanov and Nikolay O. Nikitin
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

fedot_ind-0.0.0.1-py3-none-any.whl (175.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page