Orion is a machine learning library built for data generated by satellites.

These details have been verified by PyPI

Maintainers

csala liudy mit_dai_lab smish

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

“DAI-Lab” An open source project from Data to AI Lab at MIT.

“Orion”

Orion

Orion is a machine learning library built for telemetry data generated by satellites.

License: MIT
Development Status: Pre-Alpha
Homepage: https://github.com/signals-dev/Orion
Documentation: https://signals-dev.github.io/Orion

Overview

Orion is a machine learning library built for telemetry data generated by Satellites.

With this data, our interest is to develop techniques to:

identify rare patterns and flag them for expert review.
predict outcomes ahead of time.

The library makes use of a number of automated machine learning tools developed under "The human data interaction project" within the Data to AI Lab at MIT.

With the ready availability of automated machine learning tools, the focus is on:

domain expert interaction with the machine learning system;
learning from minimal labels;
explainability of model outputs;
model audit;
scalability;

Leaderboard

In this repository we maintain an up-to-date leaderboard with the current scoring of the pipelines according to the benchmarking procedure explained in the benchmark documentation.

Benchmark is ran on 11 datasets and we record the number of wins each pipeline has over the ARIMA pipeline. Results obtained during benchmarking as well as previous releases can be found within benchmark/results folder as CSV files. Summarized results can also be browsed in the following Google Sheets document as well as the details Google Sheets document.

Pipeline	Outperforms ARIMA
LSTM Dynamic Thresholding	5
Azure	0

I. Data Format
II. Orion Pipelines
- II.1 Current Available Pipelines
III. Install
IV. Quickstart
V. Database

Data Format

Input

Orion Pipelines work on time Series that are provided as a single table of telemetry observations with two columns:

timestamp: an INTEGER or FLOAT column with the time of the observation in Unix Time Format
value: an INTEGER or FLOAT column with the observed value at the indicated timestamp

This is an example of such table:

timestamp	value
1222819200	-0.366358
1222840800	-0.394107
1222862400	0.403624
1222884000	-0.362759
1222905600	-0.370746

Output

The output of the Orion Pipelines is another table that contains the detected anomalous intervals and that has at least two columns:

start: timestamp where the anomalous interval starts
end: timestamp where the anomalous interval ends

Optionally, a third column called score can be included with a value that represents the severity of the detected anomaly.

An example of such a table is:

start	end	score
1222970400	1222992000	0.572643
1223013600	1223035200	0.572643

Dataset we use in this library

For development, evaluation of pipelines, we include a dataset which includes several satellite telemetry signals already formatted as expected by the Orion Pipelines.

This formatted dataset can be browsed and downloaded directly from the d3-ai-orion AWS S3 Bucket.

This dataset is adapted from the one used for the experiments in the Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding paper. Original source data is available for download here. We thank NASA for making this data available for public use.

Orion Pipelines

The main component in the Orion project are the Orion Pipelines, which consist of MLBlocks Pipelines specialized in detecting anomalies in time series.

As MLPipeline instances, Orion Pipelines:

consist of a list of one or more MLPrimitives
can be fitted on some data and later on used to predict anomalies on more data
can be scored by comparing their predictions with some known anomalies
have hyperparameters that can be tuned to improve their anomaly detection performance
can be stored as a JSON file that includes all the primitives that compose them, as well as other required configuration options.

Current Available Pipelines

In the Orion Project, the pipelines are included as JSON files, which can be found in the subdirectories inside the orion/pipelines folder.

This is the list of pipelines available so far, which will grow over time:

name	location	description
ARIMA	orion/pipelines/arima	ARIMA based pipeline
LSTM Dynamic Threshold	orion/pipelines/lstm_dynamic_threshold	LSTM based pipeline inspired by the Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding paper
Dummy	orion/pipelines/dummy	Dummy pipeline to showcase the input and output format and the usage of sample primitives
TadGAN	orion/pipelines/tadgan	GAN based pipeline with reconstruction based errors
Azure	orion/pipelines/azure	Azure API for Anomaly Detector

Install

Requirements

Python

Orion has been developed and runs on Python 3.6.

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where you are trying to run Orion.

MongoDB

In order to be fully operational, Orion requires having access to a MongoDB database running version 3.6 or higher.

Install with pip

The easiest and recommended way to install Orion is using pip:

pip install orion-ml

This will pull and install the latest stable release from PyPi.

If you want to install from source or contribute to the project please read the Contributing Guide.

Docker

Even thought it's not mandatory to use it, Orion comes with the possibility to be distributed and run as a docker image, making its usage in offline systems easier.

For more details please read the Docker Usage Documentation.

Quickstart

In the following steps we will show a short guide about how to run one of the Orion Pipelines on one of the signals from the Demo Dataset.

1. Load the data

In the first step we will load the S-1 signal from the Demo Dataset.

We will do so in two parts, train and test, as we will use the first part to fit the pipeline and the second one to evaluate its performance.

To do so, we need to import the orion.data.load_signal function and call it twice passing the 'S-1-train' and 'S-1-test' names.

from orion.data import load_signal

train_data = load_signal('S-1-train')
test_data = load_signal('S-1-test')

The output will be a table in the format described above:

    timestamp     value
0  1222819200 -0.366359
1  1222840800 -0.394108
2  1222862400  0.403625
3  1222884000 -0.362759
4  1222905600 -0.370746

2. Detect anomalies using Orion

Once we have the data, let us try to use an Orion pipeline to analyze it and search for anomalies.

In order to do so, we will have to create an instance of the orion.Orion class.

from orion import Orion

orion = Orion()

Optionally, we might want to select a pipeline other than the default one or alter the hyperparameters by the underlying MLBlocks pipeline.

For example, let's select the lstm_dynamic_threshold pipeline and reduce the number of training epochs and increase the verbosity if the LSTM primitive that it uses.

hyperparameters = {
    'keras.Sequential.LSTMTimeSeriesRegressor#1': {
        'epochs': 5,
        'verbose': True
    }
}
orion = Orion(
    pipeline='lstm_dynamic_threshold',
    hyperparameters=hyperparameters
)

Once we the pipeline is ready, we can proceed to fit it to our data:

orion.fit(train_data)

Once it is fitted, we are ready to use it to detect anomalies in our data:

anomalies = orion.detect(test_data)

:warning: Depending on your system and the exact versions that you might have installed some WARNINGS may be printed. These can be safely ignored as they do not interfere with the proper behavior of the pipeline.

The output of the previous command will be a pandas.DataFrame containing a table in the Output format described above:

        start         end     score
0  1394323200  1399701600  0.673494

3. Evaluate the performance of your pipeline

In this next step we will load some already known anomalous intervals and evaluate how good our anomaly detection was by comparing those with our detected intervals.

For this, we will first load the known anomalies for the signal that we are using:

from orion.data import load_anomalies

ground_truth = load_anomalies('S-1')

The output will be a table in the same format as the anomalies one.

        start         end
0  1392768000  1402423200

Afterwards, we can call the Orion.evaluate method, passing both the test data and the ground truth:

scores = orion.evaluate(test_data, ground_truth)

The output will be a pandas.Series containing a collection of scores indicating how the predictions were:

accuracy     0.988131
f1           0.892193
recall       0.805369
precision    1.000000
dtype: float64

Database

Orion comes ready to use a MongoDB Database to easily register and explore:

Multiple Datasets based on signals from one or more satellites.
Multiple Pipelines, including historical Pipeline versions.
Pipeline executions on the registered Datasets, including any environment details required to later on reproduce the results.
Pipeline execution results and detected events.
Comments about the detected events.

This, among other things, allows:

Providing visibility about the system usage.
Keeping track of the evolution of the registered pipelines and their performance over multiple datasets.
Visualizing and browsing the detected events by the pipelines using a web application.
Collecting comments from multiple domain experts about the detected events to be able to later on curate the pipelines based on their knowledge.
Reproducing previous executions in identical environments to replicate the obtained results.
Detecting and keeping a history of system failures for later investigation.

The complete Database schema and usage instructions can be found in the database documentation

History

0.1.4 - 2020-10-16

Minor enhancements to benchmark

Load ground truth before try-catch - Issue #124 by @sarahmish
Converting timestamp to datetime in Azure primitive - Issue #123 by @sarahmish
Benchmark exceptions - Issue #120 by @sarahmish

0.1.3 - 2020-09-29

New benchmark and Azure primitive.

Implement a benchmarking function new feature - Issue #94 by @sarahmish
Add azure anomaly detection as primitive new feature - Issue #97 by @sarahmish
Critic and reconstruction error combination - Issue #99 by @sarahmish
Fixed threshold for find_anomalies - Issue #101 by @sarahmish
Add an option to have window size and window step size as percentages of error size - Issue #102 by @sarahmish
Organize pipelines into verified and sandbox - Issue #105 by @sarahmish
Ground truth parameter name enhancement - Issue #114 by @sarahmish
Add benchmark dataset list and parameters to s3 bucket enhancement - Issue #118 by @sarahmish

0.1.2 - 2020-07-03

New Evaluation sub-package and refactor TadGAN.

Two bugs when saving signalrun if there is no event detected - Issue #92 by @dyuliu
File encoding/decoding issues about README.md and HISTORY.md - Issue #88 by @dyuliu
Fix bottle neck of score_anomaly in Cyclegan primitive - Issue #86 by @dyuliu
Adjust epoch meaning in Cyclegan primitive - Issue #85 by @sarahmish
Rename evaluation to benchmark and metrics to evaluation - Issue #83 by @sarahmish
Scoring function for intervals of size one - Issue #76 by @sarahmish

0.1.1 - 2020-05-11

New class and function based interfaces.

Implement the Orion Class - Issue #79 by @csala
Implement new functional interface - Issue #80 by @csala

0.1.0 - 2020-04-23

First Orion release to PyPI: https://pypi.org/project/orion-ml/

Project details

These details have been verified by PyPI

Maintainers

csala liudy mit_dai_lab smish

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.6.1

Oct 4, 2024

0.6.1.dev0 pre-release

Oct 4, 2024

0.6.0

Feb 13, 2024

0.5.3.dev1 pre-release

Feb 12, 2024

0.5.3.dev0 pre-release

Nov 10, 2023

0.5.2

Oct 19, 2023

0.5.2.dev0 pre-release

Oct 19, 2023

0.5.1

Aug 17, 2023

0.5.1.dev1 pre-release

Aug 17, 2023

0.5.1.dev0 pre-release

Aug 17, 2023

0.5.0

May 23, 2023

0.4.2.dev0 pre-release

May 22, 2023

0.4.1

Jan 31, 2023

0.4.1.dev0 pre-release

Jan 4, 2023

0.4.0

Nov 10, 2022

0.3.3.dev0 pre-release

Nov 8, 2022

0.3.2

Jul 4, 2022

0.3.2.dev0 pre-release

Jul 1, 2022

0.3.1

Apr 27, 2022

0.3.1.dev0 pre-release

Apr 20, 2022

0.3.0

Mar 31, 2022

0.2.2.dev0 pre-release

Mar 31, 2022

0.2.1

Feb 18, 2022

0.2.1.dev0 pre-release

Feb 18, 2022

0.2.0

Oct 11, 2021

0.1.8.dev0 pre-release

Oct 11, 2021

0.1.7

May 4, 2021

0.1.7.dev0 pre-release

May 4, 2021

0.1.6

Mar 8, 2021

0.1.6.dev0 pre-release

Mar 8, 2021

0.1.5

Dec 25, 2020

0.1.5.dev0 pre-release

Dec 25, 2020

This version

0.1.4

Oct 16, 2020

0.1.4.dev0 pre-release

Oct 16, 2020

0.1.3

Sep 29, 2020

0.1.3.dev1 pre-release

Sep 29, 2020

0.1.3.dev0 pre-release

Sep 28, 2020

0.1.2

Jul 3, 2020

0.1.2.dev0 pre-release

Jul 3, 2020

0.1.1

May 11, 2020

0.1.1.dev0 pre-release

May 11, 2020

0.1.0

Apr 23, 2020

0.1.0.dev0 pre-release

Apr 23, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orion-ml-0.1.4.tar.gz (192.5 kB view details)

Uploaded Oct 16, 2020 Source

Built Distribution

orion_ml-0.1.4-py2.py3-none-any.whl (84.1 kB view details)

Uploaded Oct 16, 2020 Python 2 Python 3

File details

Details for the file orion-ml-0.1.4.tar.gz.

File metadata

Download URL: orion-ml-0.1.4.tar.gz
Upload date: Oct 16, 2020
Size: 192.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1.post20200622 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.10

File hashes

Hashes for orion-ml-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`31c5c0b0b364efc4beb2ee123c1c30966790297de1549d7e3c9233b0ff199852`
MD5	`8a2147cab2aa5d0dbac73fc9dc6d705f`
BLAKE2b-256	`f020171a8af9d23330be6c1e8fe44e47d6f21cc92e30453ce74d069ce32b39a6`

See more details on using hashes here.

File details

Details for the file orion_ml-0.1.4-py2.py3-none-any.whl.

File metadata

Download URL: orion_ml-0.1.4-py2.py3-none-any.whl
Upload date: Oct 16, 2020
Size: 84.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1.post20200622 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.10

File hashes

Hashes for orion_ml-0.1.4-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0154ef298ed62703aa109eb7e119a319204726f5b1704f6fe9c985d171116e7`
MD5	`0a8d9335a0e0cab974d800230bc50ad4`
BLAKE2b-256	`421390645657a2bfe6a440c2828162248491e3a53b4bbdc5bf8f02190cb637f7`

See more details on using hashes here.

orion-ml 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Orion

Overview

Leaderboard

Table of Contents

Data Format

Input

Output

Dataset we use in this library

Orion Pipelines

Current Available Pipelines

Install

Requirements

Python

MongoDB

Install with pip

Docker

Quickstart

1. Load the data

2. Detect anomalies using Orion

3. Evaluate the performance of your pipeline

Database

History

0.1.4 - 2020-10-16

0.1.3 - 2020-09-29

0.1.2 - 2020-07-03

0.1.1 - 2020-05-11

0.1.0 - 2020-04-23

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes