Automated machine learning framework for time series analysis
Project description
Instead of using complex and resource-demanding deep learning techniques, which could be considered state-of-the-art solutions, we propose using a combination of feature extractors with an ensemble of lightweight models obtained by the algorithmic kernel of the AutoML framework FEDOT.
The application fields of the framework are the following:
Classification (time series or image)
For this purpose we introduce four feature generators:
Once the feature generation process is complete, you can apply FEDOT’s evolutionary algorithm to find the best model for the classification task.
Anomaly detection (time series or image)
Change point detection (only time series)
Object detection (only image)
Usage
FEDOT.Industrial provides a high-level API that allows you to use its capabilities in a simple way.
Classification
To conduct time series classification you need to set the experiment configuration via a dictionary, then create an instance of the Industrial class, and call its run_experiment method:
from core.api.main import FedotIndustrial
industrial = FedotIndustrial(task='ts_classification',
dataset=dataset_name,
strategy='statistical',
use_cache=True,
timeout=1,
n_jobs=2,
window_sizes='auto',
logging_level=20,
output_folder=None)
You can then load the data and run the experiment:
train_data, test_data, _ = industrial.reader.read(dataset_name='ItalyPowerDemand')
model = industrial.fit(train_features=train_data[0], train_target=train_data[1])
labels = industrial.predict(test_features=test_data[0])
metric = industrial.get_metrics(target=test_data[1], metric_names=['f1', 'roc_auc'])
The config contains the following parameters:
task - type of task to be solved (ts_classification)
dataset - name of the data set for the experiment
strategy - the way to solve the problem: a specific generator or in fedot_preset mode
use_cache - a flag to use caching of extracted features
timeout - maximum amount of time to compile a pipeline for the classification
n_jobs - number of processes for parallel execution
window_sizes - window sizes for window generators
logging_level - logging level
output_folder - path to folder to save results
Datasets for classification should be stored in the data directory and divided into train and test sets with .tsv extension. So the folder name in the data directory should be set to the name of the dataset that you want to use in the experiment. In case there is no data in the local folder, the DataLoader class will try to load data from the UCR archive.
Possible feature generators which could be specified in the configuration are quantile, wavelet, recurrence и topological.
It is also possible to ensemble several feature generators. It could be done by setting the strategy field of the config, where you need to specify the list of feature generators, to the following value:
'ensemble: topological wavelet quantile'
Feature caching
To speed up the experiment, you can cache the features produced by the feature generators. If use_cache bool flag in config is True, then every feature space generated during the experiment is cached into the corresponding folder.
The next time when the same feature space is requested, the hash is calculated again and the corresponding feature space is loaded from the cache which is much faster than generating it from scratch.
Stay tuned!
Project structure
The latest stable release of FEDOT.Industrial is in the main branch.
The repository includes the following directories:
The api folder contains the main interface classes and scripts
Package core contains the main classes and scripts
Package examples includes several how-to-use-cases where you can start to discover how the framework works
All unit and integration tests are in the test directory
The sources of the documentation are in docs
Current R&D and future plans
– Implement feature space caching for feature generators (DONE)
– Development of model containerization module
– Development of meta-knowledge storage for data obtained from the experiments
– Research on time series clusterization
Documentation
A comprehensive documentation is available at readthedocs.
Supported by
The study is supported by the Research Center Strong Artificial Intelligence in Industry of ITMO University as part of the plan of the center’s program: Development of AutoML framework for industrial tasks.
Citation
Here we will provide a list of citations for the project as soon as the articles are published.
@article{REVIN2023110483,
title = {Automated machine learning approach for time series classification pipelines using evolutionary optimisation},
journal = {Knowledge-Based Systems},
pages = {110483},
year = {2023},
issn = {0950-7051},
doi = {https://doi.org/10.1016/j.knosys.2023.110483},
url = {https://www.sciencedirect.com/science/article/pii/S0950705123002332},
author = {Ilia Revin and Vadim A. Potemkin and Nikita R. Balabanov and Nikolay O. Nikitin
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fedot_ind-0.0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40099febc816f43cfe58164b7e5aede8515d901be32de427806561979a28702b |
|
MD5 | 8e47476f09024a5ddd013680dbfbf2a9 |
|
BLAKE2b-256 | 948a40e45b7064dd52ad913cb2ec131034ab3d5879316b8169a2a06e18c7aa16 |