Skip to main content

flight-ad is a Python package for anomaly detection in the aviation domain built on top of scikit-learn.

Project description

flight-ad

Codacy Badge

flight-ad is a Python package for anomaly detection in the aviation domain built on top of scikit-learn.

It provides:

  • An implementation of an anomaly detection pipeline;
  • A DataBinder object for loading and transforming the data within the pipeline on the fly;
  • A DataWrangler object for building a data wrangling pipeline;
  • A StatisticalLearner object for binding scikit-learn's pipelines and integrating them on the anomaly detection workflow;
  • Visualization tools for assessing potential anomalies;
  • Reporting tools for analyzing results;
  • Sample airplane sensor data, repackaged from NASA's DASHlink for the purpose of evaluating and advancing data mining capabilities that can be used to promote aviation safety;
  • Adaptations of machine learning algorithms, such as a DBSCAN implementation that calculates the hyperparameter epsilon from the input data.

Installation

The easiest way to install flight-ad is using pip from your virtual environment.

Directly from GitHub:

pip install git+https://github.com/coelhosilva/flight-ad.git

Examples

This is a sample usage of the package for constructing an anomaly detection pipeline. Beware that the sample dataset may take up roughly 1 GB in disk space.

from flight_ad.datasets import load_dashlink_bindings
from flight_ad.utils.data import DataBinder
from flight_ad.wrangling import DataWrangler
from wrangling_functions import preprocess, change_col, resample, select
from flight_ad.transformations import reshape_df_interspersed
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from flight_ad.cluster import DBSCAN
from flight_ad.learn import FunctionTransformer
from flight_ad.learn import StatisticalLearner
from flight_ad.pipeline import AnomalyDetectionPipeline
from flight_ad.report import clustering_info, silhouette

# Binder
data_bindings = load_dashlink_bindings(download=True)
binder = DataBinder(data_bindings)

# Wrangler
wrangling_steps = [
    ('preprocess_flight', preprocess),
    ('resample_dataframe', resample),
    ('change_col', change_col),
    ('select_col', select)

]
wrangler = DataWrangler(wrangling_steps, memorize='change_col')

# Learner
learning_steps = {
    'preprocessing': [
        ('reshaper', FunctionEstimator(reshape_df_interspersed)),
        ('scaler', StandardScaler()),
        ('pca', PCA())
    ],
    'training': [
        ('dbscan', DBSCAN())
    ]
}
learner = StatisticalLearner(learning_steps, record='pca')

# Pipeline
ad_pipeline = AnomalyDetectionPipeline(binder, wrangler, learner)
ad_pipeline.fit()

# Results
labels, n_clusters, n_noise = clustering_info(learner.pipeline['dbscan'])
avg_silhouette, _ = silhouette(learner.partial_data['pca'], labels)

Package structure

TBD.

Dependencies

flight-ad requires:

  • Python (>=3.6)
  • NumPy
  • pandas
  • scikit-learn
  • matplotlib
  • tqdm

Contributions

We welcome and encourage new contributors to help test flight-ad and add new functionality. Any input, feedback, bug report or contribution is welcome.

If one wishes to contact the author, they may do so by emailing coelho@ita.br.

Citation

If you use flight-ad in a scientific publication, we would appreciate citations.

BibTex: TBD.

Citation string: TBD.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flight-ad-0.0.1.tar.gz (19.1 kB view hashes)

Uploaded Source

Built Distribution

flight_ad-0.0.1-py3-none-any.whl (23.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page