Orion is a machine learning library built for data generated by satellites.
Project description
An open source project from Data to AI Lab at MIT.
Orion
Orion is a machine learning library built for telemetry data generated by satellites.
- License: MIT
- Development Status: Pre-Alpha
- Homepage: https://github.com/D3-AI/Orion
Overview
Orion is a machine learning library built for telemetry data generated by Satellites.
With this data, our interest is to develop techniques to:
- identify rare patterns and flag them for expert review.
- predict outcomes ahead of time.
The library makes use of a number of automated machine learning tools developed under "The human data interaction project" within the Data to AI Lab at MIT.
With the ready availability of automated machine learning tools, the focus is on:
- domain expert interaction with the machine learning system;
- learning from minimal labels;
- explainability of model outputs;
- model audit;
- scalability;
Table of Contents
Data Format
Input
Orion Pipelines work on time Series that are provided as a single table of telemetry observations with two columns:
timestamp
: an INTEGER or FLOAT column with the time of the observation in Unix Time Formatvalue
: an INTEGER or FLOAT column with the observed value at the indicated timestamp
This is an example of such table:
timestamp | value |
---|---|
1222819200 | -0.366358 |
1222840800 | -0.394107 |
1222862400 | 0.403624 |
1222884000 | -0.362759 |
1222905600 | -0.370746 |
Output
The output of the Orion Pipelines is another table that contains the detected anomalous intervals and that has at least two columns:
start
: timestamp where the anomalous interval startsend
: timestamp where the anomalous interval ends
Optionally, a third column called score
can be included with a value that represents the
severity of the detected anomaly.
An example of such a table is:
start | end | score |
---|---|---|
1222970400 | 1222992000 | 0.572643 |
1223013600 | 1223035200 | 0.572643 |
Dataset we use in this library
For development, evaluation of pipelines, we include a dataset which includes several satellite telemetry signals already formatted as expected by the Orion Pipelines.
This formatted dataset can be browsed and downloaded directly from the d3-ai-orion AWS S3 Bucket.
This dataset is adapted from the one used for the experiments in the Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding paper. Original source data is available for download here. We thank NASA for making this data available for public use.
Orion Pipelines
The main component in the Orion project are the Orion Pipelines, which consist of MLBlocks Pipelines specialized in detecting anomalies in time series.
As MLPipeline
instances, Orion Pipelines:
- consist of a list of one or more MLPrimitives
- can be fitted on some data and later on used to predict anomalies on more data
- can be scored by comparing their predictions with some known anomalies
- have hyperparameters that can be tuned to improve their anomaly detection performance
- can be stored as a JSON file that includes all the primitives that compose them, as well as other required configuration options.
Current Available Pipelines
In the Orion Project, the pipelines are included as JSON files, which can be found inside the orion/pipelines folder.
This is the list of pipelines available so far, which will grow over time:
name | location | description |
---|---|---|
Dummy | orion/pipelines/dummy.json | Dummy Pipeline to showcase the input and output format and the usage of sample primitives |
LSTM Dynamic Threshold | orion/pipelines/lstm_dynamic_threshold.json | LSTM Based pipeline inspired by the Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding paper |
Mean 24h LSTM | orion/pipelines/mean_24h_lstm.json | LSTM Based pipeline with 24h mean aggregation preprocessing |
Median 24h LSTM | orion/pipelines/median_24h_lstm.json | LSTM Based pipeline with 24h median aggregation preprocessing |
Sum 24h LSTM | orion/pipelines/sum_24h_lstm.json | LSTM Based pipeline with 24h sum aggregation preprocessing |
Skew 24h LSTM | orion/pipelines/skew_24h_lstm.json | LSTM Based pipeline with 24h skew aggregation preprocessing |
CycleGAN | orion/pipelines/cyclegan.json | CycleGAN Based pipeline |
ARIMA | orion/pipelines/arima.json | ARIMA Based pipeline |
Leaderboard
In this repository we maintain this up-to-date leaderboard with the current scoring of the pipelines according to the benchmarking procedure explained in the benchmark documentation.
pipeline | accuracy | f1 | precision | recall |
---|---|---|---|---|
CycleGAN | 0.781147 | 0.137234 | 0.147674 | 0.18173 |
LSTM Dynamic Thresholding | 0.832052 | 0.125999 | 0.178968 | 0.151298 |
Dummy | 0.818975 | 0.108436 | 0.13994 | 0.133865 |
Mean 24h LSTM | 0.667412 | 0.0420656 | 0.0775713 | 0.0456106 |
Sum 24h LSTM | 0.685844 | 0.0417817 | 0.066248 | 0.033882 |
ARIMA | 0.510343 | 0.038821 | 0.0604475 | 0.0377441 |
Median 24h LSTM | 0.673667 | 0.0237867 | 0.0604165 | 0.0178578 |
Skew 24h LSTM | 0.369548 | 0.01142 | 0.0213837 | 0.00902504 |
Getting Started
Requirements
Python
Orion has been developed and runs on Python 3.6.
Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where you are trying to run Orion.
MongoDB
In order to be fully operational, Orion requires having access to a MongoDB database running version 3.6 or higher.
Install
The easiest and recommended way to install Orion is using pip:
pip install orion-ml
This will pull and install the latest stable release from PyPi.
If you want to install from source or contribute to the project please read the Contributing Guide.
Docker
Even thought it's not mandatory to use it, Orion comes with the possibility to be distributed and run as a docker image, making its usage in offline systems easier.
For more details please read the Docker Usage Documentation.
Quickstart
In the following steps we will show a short guide about how to run one of the Orion Pipelines on one of the signals from the Demo Dataset.
1. Load the data
In the first step we will load the S-1 signal from the Demo Dataset.
We will do so in two parts, train and test, as we will use the first part to fit the pipeline and the second one to evaluate its performance.
To do so, we need to import the orion.data.load_signal
function and call it twice passing
the 'S-1-train'
and 'S-1-test'
names.
from orion.data import load_signal
train = load_signal('S-1-train')
test = load_signal('S-1-test')
The output will be a table in the format described above:
timestamp value
0 1222819200 -0.366359
1 1222840800 -0.394108
2 1222862400 0.403625
3 1222884000 -0.362759
4 1222905600 -0.370746
2. Detect anomalies using a pipeline
Once we have the data, let us try to use the LSTM pipeline to analyze it and search for anomalies.
In order to do so, we will have import the orion.analysis.analyze
function and pass it
the train and test dataframes and the name of the pipeline that we want to use:
from orion.analysis import analyze
anomalies = analyze(
pipeline='lstm_dynamic_threshold',
train=train,
test=test
)
NOTE: Depending on your system and the exact versions that you might have installed some WARNINGS may be printed. These can be safely ignored as they do not interfere with the proper behavior of the pipeline.
The output of the previous command will be a pandas.DataFrame
containing a table in the
Output format described above:
start end score
0 1394323200 1399701600 0.673494
3. Evaluate performance
In this next step we will load some already known anomalous intervals and evaluate how good our anomaly detection was by comparing those with our detected intervals.
For this, we will first load the known anomalies for the signal that we are using:
from orion.data import load_anomalies
known_anomalies = load_anomalies('S-1')
The output will be a table in the same format as the anomalies
one.
start end
0 1392768000 1402423200
Afterwards, we pass the ground truth, the detected anomalies and the original test data
to the orion.metrics.accuracy_score
and orion.metrics.f1_score
functions in order
to compute a score that indicates how good our anomaly detection was:
from orion.metrics import accuracy_score, f1_score
accuracy_score(known_anomalies, anomalies, test) # -> 0.972987721691678
f1_score(known_anomalies, anomalies, test) # -> 0.7155172413793103
Database
Orion comes ready to use a MongoDB Database to easily register and explore:
- Multiple Datasets based on signals from one or more satellites.
- Multiple Pipelines, including historical Pipeline versions.
- Pipeline executions on the registered Datasets, including any environment details required to later on reproduce the results.
- Pipeline execution results and detected events.
- Comments about the detected events.
This, among other things, allows:
- Providing visibility about the system usage.
- Keeping track of the evolution of the registered pipelines and their performance over multiple datasets.
- Visualizing and browsing the detected events by the pipelines using a web application.
- Collecting comments from multiple domain experts about the detected events to be able to later on curate the pipelines based on their knowledge.
- Reproducing previous executions in identical environments to replicate the obtained results.
- Detecting and keeping a history of system failures for later investigation.
The complete Database schema and usage instructions can be found in the database documentation
History
0.1.0
- First release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file orion-ml-0.1.0.dev0.tar.gz
.
File metadata
- Download URL: orion-ml-0.1.0.dev0.tar.gz
- Upload date:
- Size: 161.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b074178463396edc883879d9a4505178af7d1d70c5cab351841187540b5b3730 |
|
MD5 | 2b551ad883b2e0e1e4cde96e2944be4a |
|
BLAKE2b-256 | aecd8dd7b1c866b429ab7810a63f74f12bbc21825f1b50772178eb8bb59c1a59 |
File details
Details for the file orion_ml-0.1.0.dev0-py2.py3-none-any.whl
.
File metadata
- Download URL: orion_ml-0.1.0.dev0-py2.py3-none-any.whl
- Upload date:
- Size: 38.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5559425d06396f21077b0a34c394df0bf800a56dc757fd3e26f7003528d06ba4 |
|
MD5 | c32fbd4510b18e51f94fcded2e1712f2 |
|
BLAKE2b-256 | c46a94d42021da70da225616ec97c0f2ec5b89ca91fe1d641614cfa9bc86d865 |