Orion is a machine learning library built for unsupervised time series anomaly detection.
Project description
An open source project from Data to AI Lab at MIT.
Orion
A machine learning library for unsupervised time series anomaly detection.
Important Links | |
---|---|
:computer: Website | Check out the Sintel Website for more information about the project. |
:book: Documentation | Quickstarts, User and Development Guides, and API Reference. |
:star: Tutorials | Checkout our notebooks |
:octocat: Repository | The link to the Github Repository of this library. |
:scroll: License | The repository is published under the MIT License. |
Community | Join our Slack Workspace for announcements and discussions. |
Overview
Orion is a machine learning library built for unsupervised time series anomaly detection. With a given time series data, we provide a number of “verified” ML pipelines (a.k.a Orion pipelines) that identify rare patterns and flag them for expert review.
The library makes use of a number of automated machine learning tools developed under Data to AI Lab at MIT.
Read about using an Orion pipeline on NYC taxi dataset in a blog series:
Part 1: Learn about unsupervised time series anomaly detection | Part 2: Learn how we use GANs to solving the problem? | Part 3: How does one evaluate anomaly detection pipelines? |
---|---|---|
Notebooks: Discover Orion through colab by launching our notebooks!
Quickstart
Install with pip
The easiest and recommended way to install Orion is using pip:
pip install orion-ml
This will pull and install the latest stable release from PyPi.
In the following example we show how to use one of the Orion Pipelines.
Fit an Orion pipeline
We will load a demo data for this example:
from orion.data import load_signal
train_data = load_signal('S-1-train')
train_data.head()
which should show a signal with timestamp
and value
.
timestamp value
0 1222819200 -0.366359
1 1222840800 -0.394108
2 1222862400 0.403625
3 1222884000 -0.362759
4 1222905600 -0.370746
In this example we use aer
pipeline and set some hyperparameters (in this case training epochs as 5).
from orion import Orion
hyperparameters = {
'orion.primitives.aer.AER#1': {
'epochs': 5,
'verbose': True
}
}
orion = Orion(
pipeline='aer',
hyperparameters=hyperparameters
)
orion.fit(train_data)
Detect anomalies using the fitted pipeline
Once it is fitted, we are ready to use it to detect anomalies in our incoming time series:
new_data = load_signal('S-1-new')
anomalies = orion.detect(new_data)
:warning: Depending on your system and the exact versions that you might have installed some WARNINGS may be printed. These can be safely ignored as they do not interfere with the proper behavior of the pipeline.
The output of the previous command will be a pandas.DataFrame
containing a table of detected anomalies:
start end severity
0 1402012800 1403870400 0.122539
Leaderboard
In every release, we run Orion benchmark. We maintain an up-to-date leaderboard with the current scoring of the verified pipelines according to the benchmarking procedure.
We run the benchmark on 12 datasets with their known grounth truth. We record the score of the pipelines on each datasets. To compute the leaderboard table, we showcase the number of wins each pipeline has over the ARIMA pipeline.
Pipeline | Outperforms ARIMA |
---|---|
AER | 11 |
TadGAN | 7 |
LSTM Dynamic Thresholding | 7 |
LSTM Autoencoder | 8 |
Dense Autoencoder | 7 |
VAE | 7 |
GANF | 7 |
Azure | 0 |
You can find the scores of each pipeline on every signal recorded in the details Google Sheets document. The summarized results can also be browsed in the following summary Google Sheets document.
Resources
Additional resources that might be of interest:
- Learn about benchmarking pipelines.
- Read about pipeline evaluation.
- Find out more about TadGAN.
Citation
If you use AER for your research, please consider citing the following paper:
Lawrence Wong, Dongyu Liu, Laure Berti-Equille, Sarah Alnegheimish, Kalyan Veeramachaneni. AER: Auto-Encoder with Regression for Time Series Anomaly Detection.
@inproceedings{wong2022aer,
title={AER: Auto-Encoder with Regression for Time Series Anomaly Detection},
author={Wong, Lawrence and Liu, Dongyu and Berti-Equille, Laure and Alnegheimish, Sarah and Veeramachaneni, Kalyan},
booktitle={2022 IEEE International Conference on Big Data (IEEE BigData)},
pages={1152-1161},
doi={10.1109/BigData55660.2022.10020857},
organization={IEEE},
year={2022}
}
If you use TadGAN for your research, please consider citing the following paper:
Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, Kalyan Veeramachaneni. TadGAN - Time Series Anomaly Detection Using Generative Adversarial Networks.
@inproceedings{geiger2020tadgan,
title={TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks},
author={Geiger, Alexander and Liu, Dongyu and Alnegheimish, Sarah and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan},
booktitle={2020 IEEE International Conference on Big Data (IEEE BigData)},
pages={33-43},
doi={10.1109/BigData50022.2020.9378139},
organization={IEEE},
year={2020}
}
If you use Orion which is part of the Sintel ecosystem for your research, please consider citing the following paper:
Sarah Alnegheimish, Dongyu Liu, Carles Sala, Laure Berti-Equille, Kalyan Veeramachaneni. Sintel: A Machine Learning Framework to Extract Insights from Signals.
@inproceedings{alnegheimish2022sintel,
title={Sintel: A Machine Learning Framework to Extract Insights from Signals},
author={Alnegheimish, Sarah and Liu, Dongyu and Sala, Carles and Berti-Equille, Laure and Veeramachaneni, Kalyan},
booktitle={Proceedings of the 2022 International Conference on Management of Data},
pages={1855–1865},
numpages={11},
publisher={Association for Computing Machinery},
doi={10.1145/3514221.3517910},
series={SIGMOD '22},
year={2022}
}
History
0.5.1 - 2023-08-16
This version introduces a new dataset to the benchmark.
Issues resolved
- Add UCR dataset to the benchmark – Issue #443 by @sarahmish
- docker image build failed – Issue #439 by @sarahmish
- Edit interval settings in
azure
pipeline – Issue #436 by @sarahmish
0.5.0 - 2023-05-23
This version uses ml-stars
package instead of mlprimitives
.
Issues resolved
- Migrate to ml-stars – Issue #418 by @sarahmish
- Updating
best_cost
infind_anomalies
primitive – Issue #403 by @sarahmish - Retire
lstm_dynamic_threshold_gpu
andlstm_autoencoder_gpu
pipeline maintenance – Issue #373 by @sarahmish - Typo in xlsxwriter dependency specification – Issue #394 by @sarahmish
orion.evaluate
uses fails when fitting – Issue #384 by @sarahmish- AER pipeline with visualization option – Issue #379 by @sarahmish
0.4.1 - 2023-01-31
Issues resolved
- Move VAE from sandbox to verified – Issue #377 by @sarahmish
- Pin
opencv
– Issue #372 by @sarahmish - Pin
scikit-learn
– Issue #367 by @sarahmish - Fix VAE documentation – Issue #360 by @sarahmish
0.4.0 - 2022-11-08
This version introduces several new enhancements:
- Support to python 3.8
- Migrating to Tensorflow 2.0
- New pipeline, namely
VAE
, a Variational AutoEncoder model.
Issues resolved
- Add python 3.8 – Issue #342 by @sarahmish
- VAE (Variational Autoencoders) pipeline implementation – Issue #349 by @dyuliu
- Add masking option for
regression_errors
– Issue #352 by @dyuliu - Changes in TadGAN for tensorflow 2.0 – Issue #161 by @lcwong0928
- Add an automatic dependency checker – Issue #320 by @sarahmish
- TadGAN
batch_size
cannot be changed – Issue #313 by @sarahmish
0.3.2 - 2022-07-04
This version fixes some of the issues in aer
, ae
, and tadgan
pipelines.
Issues resolved
- Fix AER model predict error after loading – Issue #304 by @lcwong0928
- Update AE to work with any
window_size
– Issue #300 by @sarahmish - Updated tadgan_viz.json – Issue #292 by @Hramir
0.3.1 - 2022-04-26
This version introduce a new pipeline, namely AER
, an AutoEncoder Regressor model.
Issues resolved
- Add AER Model - Issue #286 by @lcwong0928
0.3.0 - 2022-03-31
This version deprecates the support of OrionDBExplorer
, which has been migrated to
sintel. As a result, Orion
no longer requires
mongoDB as a dependency.
Issues resolved
- Update dependency - Issue #283 by @sarahmish
- General housekeeping - Issue #278 by @sarahmish
- Fix tutorial testing issue - Issue #276 by @sarahmish
- Migrate OrionExplorer to Sintel - Issue #275 by @dyuliu
- LSTM viz JSON pipeline added - Issue #271 by @Hramir
0.2.1 - 2022-02-18
This version introduces improvements and more testing.
Issues resolved
- Adjusting builds for TadGAN - Issue #261 by @sarahmish
- Testing tutorials, dependencies, and OS - Issue #251 by @sarahmish
0.2.0 - 2021-10-11
This version supports multivariate timeseries as input. In addition to minor improvements and maintenance.
Issues resolved
setuptools
no longer supportslib2to3
breakingmongoengine
- Issue #252 by @sarahmish- Supporting multivariate input - Issue #248 by @sarahmish
- TadGAN pipeline with visualization option - Issue #240 by @sarahmish
- Support saving absolute path for add_signals and add_signal when using dbExplorer - Issue #202 by @sarahmish
- dynamic scalability of TadGAN primitive based on
window_size
- Issue #87 by @sarahmish
0.1.7 - 2021-05-04
This version adds new features to the benchmark function where users can now save pipelines, view results as they are being calculated, and allow a single evaluation to be compared multiple times.
Issues resolved
- Dask issues in benchmark function & improvements - Issue #225 by @sarahmish
- Numerical overflow when using contextual metrics - Issue #212 by @kronerte
0.1.6 - 2021-03-08
This version introduces two new pipelines: LSTM AE and Dense AE.
In addition to minor improvements, a bit of code refactoring took place to introduce
a new primtive: reconstruction_errors
.
Issues resolved
- Comparison of DTW library performance - Issue #205 by @sarahmish
- Not able to pickle dump tadgan pipeline - Issue #200 by @sarahmish
- New pipeline LSTM and Dense autoencoders - Issue #194 by @sarahmish
- Readme - Issue #192 by @pvk-developer
- Unable to launch cli - Issue #186 by @sarahmish
- bullet points not formatted correctly in index.rst - Issue #178 by @micahjsmith
- Update notebooks - Issue #176 by @sarahmish
- Inaccuracy in README.md file in orion/evaluation/ - Issue #157 by @sarahmish
- Dockerfile -- docker does not find orion primitives automatically - Issue #155 by @sarahmish
- Primitive documentation - Issue #151 by @sarahmish
- Variable name inconsistency in tadgan - Issue #150 by @sarahmish
- Sync leaderboard tables between
BENCHMARK.md
and the docs - Issue #148 by @sarahmish
0.1.5 - 2020-12-25
This version includes the new style of documentation and a revamp of the README.md
. In addition to some minor improvements
in the benchmark code and primitives. This release includes the transfer of tadgan
pipeline to verified
.
Issues resolved
- Link with google colab - Issue #144 by @sarahmish
- Add
timeseries_anomalies
unittests - Issue #136 by @sarahmish - Update
find_sequences
in converting series to arrays - Issue #135 by @sarahmish - Definition of error/critic smooth window in score anomalies primitive - Issue #132 by @sarahmish
- Train-test split in benchmark enhancement - Issue #130 by @sarahmish
0.1.4 - 2020-10-16
Minor enhancements to benchmark
- Load ground truth before try-catch - Issue #124 by @sarahmish
- Converting timestamp to datetime in Azure primitive - Issue #123 by @sarahmish
- Benchmark exceptions - Issue #120 by @sarahmish
0.1.3 - 2020-09-29
New benchmark and Azure primitive.
- Implement a benchmarking function new feature - Issue #94 by @sarahmish
- Add azure anomaly detection as primitive new feature - Issue #97 by @sarahmish
- Critic and reconstruction error combination - Issue #99 by @sarahmish
- Fixed threshold for
find_anomalies
- Issue #101 by @sarahmish - Add an option to have window size and window step size as percentages of error size - Issue #102 by @sarahmish
- Organize pipelines into verified and sandbox - Issue #105 by @sarahmish
- Ground truth parameter name enhancement - Issue #114 by @sarahmish
- Add benchmark dataset list and parameters to s3 bucket enhancement - Issue #118 by @sarahmish
0.1.2 - 2020-07-03
New Evaluation sub-package and refactor TadGAN.
- Two bugs when saving signalrun if there is no event detected - Issue #92 by @dyuliu
- File encoding/decoding issues about
README.md
andHISTORY.md
- Issue #88 by @dyuliu - Fix bottle neck of
score_anomaly
in Cyclegan primitive - Issue #86 by @dyuliu - Adjust
epoch
meaning in Cyclegan primitive - Issue #85 by @sarahmish - Rename evaluation to benchmark and metrics to evaluation - Issue #83 by @sarahmish
- Scoring function for intervals of size one - Issue #76 by @sarahmish
0.1.1 - 2020-05-11
New class and function based interfaces.
- Implement the Orion Class - Issue #79 by @csala
- Implement new functional interface - Issue #80 by @csala
0.1.0 - 2020-04-23
First Orion release to PyPI: https://pypi.org/project/orion-ml/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file orion-ml-0.5.1.tar.gz
.
File metadata
- Download URL: orion-ml-0.5.1.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.13.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b26cd4728a34de7d8ce56857cabe77a182e25e7baabca927f95c50e3ffe51b4 |
|
MD5 | 9f5d6b499ab4e3b5b86910d9a88aaf5b |
|
BLAKE2b-256 | f1cff4579d7de41164ac1e514de209dd12b3070d686b17c09c8ce0a346eb297c |
File details
Details for the file orion_ml-0.5.1-py2.py3-none-any.whl
.
File metadata
- Download URL: orion_ml-0.5.1-py2.py3-none-any.whl
- Upload date:
- Size: 120.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.13.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6ddf520b908b5a45ee3e58d7bb37b78bebc8c741f96558230cd85d26a31406c |
|
MD5 | 30225a8fb63591740060ecc1450d22a1 |
|
BLAKE2b-256 | 3838103c115f0ffc488e52f3df4bd50f3f143f4902a966875f54e0b7a4ef3638 |