ADLStream is a novel asynchronous dual-pipeline deep learning framework for data stream mining

Project description

ADLStream

Asynchronous dual-pipeline deep learning framework for online data stream mining.

ADLStream is a novel asynchronous dual-pipeline deep learning framework for data stream mining. This system has two separated layers for training and testing that work simultaneously in order to provide quick predictions and perform frequent updates of the model. The dual-layer architecture allows to alleviate the computational cost problem of complex deep learning models, such as convolutional neural networks, for the data streaming context, in which speed is essential.

Complete documentation and API of ADLStream can be found in adlstream.readthedocs.io.

ADLStream - Asynchronous dual-pipeline deep learning framework for online data stream mining.

Installation Guide

GPU support

Ideally, ADLStream should be run in a two GPU computer. However, it is not compulsory and ADLStream can be also run in CPU.

ADLStream uses Tensorflow. If you are interested in running ADLStream in GPU, the tensorflow>=2.1.0 GPU specifications are querired.

If you don't want to use GPU go to Installing ADLStream.

Hardware requirements

Computer with at least 2 NVIDIA® GPU card with CUDA® Compute Capability 3.5 or higher

Software requirements

The following NVIDIA® software must be installed on your system:

NVIDIA® GPU drivers —CUDA 10.0 requires 418.x or higher.
CUDA® Toolkit —TensorFlow supports CUDA 10.1 (TensorFlow >= 2.1.0)
CUPTI ships with the CUDA Toolkit.
cuDNN SDK (>= 7.6)
(Optional) TensorRT 6.0 to improve latency and throughput for inference on some models.

Installing ADLStream

You can install ADLStream and its dependencies from PyPI with:

pip install ADLStream

We strongly recommend that you install ADLStream in a dedicated virtualenv, to avoid conflicting with your system packages.

To use ADLStream:

import ADLStream

Getting Started

These instructions explain how to use ADLStream framework with a simple example.

In this example we will use a LSTM model for time series forecasting in streaming.

1. Create the stream

Fist of all we will need to create the stream. Stream objects can be created using the classes from ADLStream.data.stream. We can choose different options depending on the source of our stream (from a csv file, a Kafka cluster, etc).

In this example, we will use the FakeStream, which implements a sine wave.

import ADLStream

stream = ADLStream.data.stream.FakeStream(
    num_features=6, stream_length=1000, stream_period=100
)

More precisely, this stream will return a maximun of 1000 instances. The stream sends one message every 100 milliseconds (0.1 seconds).

2. Create the stream generator.

Once we have our source stream, we need to create our stream generator. A StreamGenerator is an object that will preprocess the stream and convert the messages into input (x) and target (y) data of the deep learning model. There are different options to choose under ADLStream.data and, if needed, we can create our custom StreamGenerator by inheriting BaseStreamGenerator.

As our problem is time series forecasting, we will use the MovingWindowStreamGenerator, which performs the moving-window preprocessing method.

stream_generator = ADLStream.data.MovingWindowStreamGenerator(
    stream=stream, past_history=12, forecasting_horizon=3, shift=1
)

For the example we have set the past history to 12 and the model will predict the next 3 elements.

3. Configure the evaluation process.

In order to evaluate the performance of the model, we need to create a validator object. There exist different alternative for data-stream validation, some of the most common one can be found under ADLStream.evaluation. Furthermore, custom evaluators can be easily implemented by inheriting BaseEvaluator.

In this case, we are going to create a PrequentialEvaluator which implements the idea that more recent examples are more important using a decaying factor.

evaluator = ADLStream.evaluation.PrequentialEvaluator(
    chunk_size=10,
    metric="MAE",
    fadding_factor=0.98,
    results_file="ADLStream.csv",
    dataset_name="Fake Data",
    show_plot=True,
    plot_file="test.jpg",
)

As can be seen, we are using the mean absolute error (MAE) metrics. Other options can be found in ADLStream.evaluation.metrics. The evaluator will save the progress of the error metric in results_file and will also plot the progress and saved the image in plot_file.

4. Configure model and create ADLStream

Finally we will create our ADLStream object specifying the model to use.

The required model arguments are the architecture, the loss and the optimizer. In addition, we can provides a dict with the model parameters to customize its architecture. All the available model architecture and its parameters can be found in ADLStream.models.

For the example we are using a deep learning model with 3 stacked LSTM layers of 16, 32 and 64 units followed by a fully connected block of two layers with 16 and 8 neurons.

model_architecture = "lstm"
model_loss = "mae"
model_optimizer = "adam"
model_parameters = {
    "recurrent_units": [16, 32, 64],
    "recurrent_dropout": 0,
    "return_sequences": False,
    "dense_layers": [16, 8],
    "dense_dropout": 0,
}

adls = ADLStream.ADLStream(
    stream_generator=stream_generator,
    evaluator=evaluator,
    batch_size=60,
    num_batches_fed=20,
    model_architecture=model_architecture,
    model_loss=model_loss,
    model_optimizer=model_optimizer,
    model_parameters=model_parameters,
    log_file="ADLStream.log",
)

5. Run ADLStream & Results

Once we came the ADLStream object created, we can initiate it by calling its run function.

adls.run()

The processes will start and the progress will be plot obtaining a result similar to this one

output-plot

Complete API reference can be found here.

Research papers related

Here it is the original paper that you can cite to reference ADLStream

Lara-Benítez, Pedro, Manuel Carranza-garcía, et al. ‘Asynchronous Dual-pipeline Deep Learning Framework for Online Data Stream Classification’. Integrated Computer-Aided Engineering. 1 Jan. 2020 : 101 – 119.

Any other study using ADLStream framework will be listed here.

Lara-Benítez, Pedro, et al. "On the performance of deep learning models for time series classification in streaming." International Workshop on Soft Computing Models in Industrial and Environmental Applications. Springer, Cham, 2020.

Contributing

Read CONTRIBUTING.md. We appreciate all kinds of help.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Authors

Pedro Lara-Benítez - LinkedIn
Manuel Carranza-García - LinkedIn
Jorge García-Gutiérrez
José C. Riquelme

Contributors

Project details

Release history Release notifications | RSS feed

This version

0.1.5

Feb 10, 2022

0.1.4

Jan 19, 2022

0.1.3

Feb 18, 2021

0.1.2

Dec 30, 2020

0.1.1

Sep 25, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ADLStream-0.1.5.tar.gz (30.3 kB view details)

Uploaded Feb 10, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ADLStream-0.1.5-py3-none-any.whl (46.7 kB view details)

Uploaded Feb 10, 2022 Python 3

File details

Details for the file ADLStream-0.1.5.tar.gz.

File metadata

Download URL: ADLStream-0.1.5.tar.gz
Upload date: Feb 10, 2022
Size: 30.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.10

File hashes

Hashes for ADLStream-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`b0b49650dd8fa4e28d33b98e85f7886f81edc4b7b56fc0733cbbde4ce978e784`
MD5	`06574a5a335f8122ec92700bd4853cf2`
BLAKE2b-256	`439e74c6d88156419176e9d3bb3bc12cbdec3258ba8816e081b12a40b9d514e3`

See more details on using hashes here.

File details

Details for the file ADLStream-0.1.5-py3-none-any.whl.

File metadata

Download URL: ADLStream-0.1.5-py3-none-any.whl
Upload date: Feb 10, 2022
Size: 46.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.10

File hashes

Hashes for ADLStream-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d6962cdcaac32b0b54123c83c3d6775bd9dc90f4731b4b4585871069266c4607`
MD5	`75e2bf971013ce72efb28e9f423aae80`
BLAKE2b-256	`0ac9fea2d34ea6845d9b29bb58b8335883d5a89ef0510f6585b91468aebb1ff6`

See more details on using hashes here.

ADLStream 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ADLStream

Asynchronous dual-pipeline deep learning framework for online data stream mining.

Installation Guide

GPU support

Hardware requirements

Software requirements

Installing ADLStream

Getting Started

1. Create the stream

2. Create the stream generator.

3. Configure the evaluation process.

4. Configure model and create ADLStream

5. Run ADLStream & Results

Research papers related

Contributing

License

Authors

Contributors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes