Skip to main content

Tools for analyzing trending topics

Project description

Moda

Models and evaluation framework for trending topics detection and anomaly detection.

Moda provides an interface for evaluating models on either univariate or multi-category time-series datasets. It further allows the user to add additional models using a scikit-learn style API. All models provided in Moda were adapted to a multi-category scenario using by wrapping a univariate model to run on multiple categories. It further allows the evaluation of models using either a train/test split or a time-series cross validation.

Usage

Turning an items dataset into a moda dataset:

moda uses a MultiIndex to hold the datestamp and category. All models have been adapted to accept such structure.

import pandas as pd
from moda.dataprep import raw_to_ts, ts_to_range

DATAPATH = "example/SF_data/SF311-2008.csv"
# The full dataset can be downloaded from here: https://data.sfgov.org/City-Infrastructure/311-Cases/vw6y-z8j6/data
TIME_RANGE = "24H" # Aggregate all events in the raw data into 3 hour intervals

# Read raw file
raw = pd.read_csv(DATAPATH)

# Turn the raw data into a time series (with date as a pandas DatetimeIndex)
ts = raw_to_ts(raw)

# Aggregate items per time and category, given a time interval
ranged_ts = ts_to_range(ts,time_range=TIME_RANGE)

Run a model:

Run one model, and extract metrics using a manually labeled set

from moda.evaluators import get_metrics_for_all_categories, get_final_metrics
from moda.dataprep import read_data
from moda.models import STLTrendinessDetector

model = STLTrendinessDetector(freq='24H', 
                              min_value=10,
                              anomaly_type='residual',
                              num_of_std=3, lo_delta=0)

# Take the entire time series and evaluate anomalies on all of it or just the last window(s)
prediction = model.predict(dataset)
raw_metrics = get_metrics_for_all_categories(dataset[['value']], prediction[['prediction']], dataset[['label']],
                                             window_size_for_metrics=1)
metrics = get_final_metrics(raw_metrics)

## Plot results for each category
model.plot(labels=dataset['label'])

Model evaluation

Example for a train/test split and evaluation

from moda.evaluators import get_metrics_for_all_categories, get_final_metrics
from moda.dataprep import read_data
from moda.models import STLTrendinessDetector

dataset = read_data("datasets/SF24H_labeled.csv")
print(dataset.head())

model = STLTrendinessDetector(freq='24H', 
                              min_value=10,
                              anomaly_type='residual',
                              num_of_std=3, lo_delta=0)

# Take the entire time series and evaluate anomalies on all of it or just the last window(s)
prediction = model.predict(dataset)
raw_metrics = get_metrics_for_all_categories(dataset[['value']], prediction[['prediction']], dataset[['label']],
                                             window_size_for_metrics=1)
metrics = get_final_metrics(raw_metrics)
print('f1 = {}'.format(metrics['f1']))
print('precision = {}'.format(metrics['precision']))
print('recall = {}'.format(metrics['recall']))

## Plot results for each category
#model.plot(labels=dataset['label'])   

Models currently included:

  1. Moving average based seasonality decomposition (MA adapted for trendiness detection)

A wrapper on statsmodel's seasonal_decompose. A naive decomposition which uses a moving average to remove the trend, and a convolution filter to detect seasonality. The result is a time series of residuals. In order to detect anomalies and interesting trends in the time series, we look for outliers on the decomposed trend series and the residuals series. Points are considered outliers if their value is higher than a number of standard deviations of the historical values in a previous window. We evaluated different policies for trendiness prediction: 1. residual anomaly only, 2. trend anomaly only, residual OR trend anomaly, residual AND trend anomaly. Figure 6 shows an example of such method and the means to detect anomalies.

  1. Seasonality and trend decomposition using Loess (Adapted STL)

STL uses iterative Loess smoothing to obtain an estimate of the trend and then Loess smoothing again to extract a changing additive seasonal component. It can handle any type of seasonality, and the seasonality value can change over time. We used the same anomaly detection mechanism as the moving-average based seasonal decomposition. Wrapper on (https://github.com/jrmontag/STLDecompose)

  1. Azure anomaly detector

Use the Azure Anomaly Detector cognitive service as a black box for detecting anomalies. Azure Anomaly finder provides an upper bound that can be used to estimate the degree of anomaly.

  1. Twitter

A wrapper on Twitter's AnomalyDetection package (https://github.com/Marcnuth/AnomalyDetection)

  1. LSTMs

Trains a forecasting LSTM model, and compares the prediction value at time t vs. the actual value at time t. Then, estimate the difference by comparison to the standard deviation of previous differences.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

moda-0.2.1-py3.6.egg (34.9 kB view details)

Uploaded Source

File details

Details for the file moda-0.2.1-py3.6.egg.

File metadata

  • Download URL: moda-0.2.1-py3.6.egg
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.5

File hashes

Hashes for moda-0.2.1-py3.6.egg
Algorithm Hash digest
SHA256 400070d2c5f3c668663280056f220ea0a93784dee15351e5fac8770fbac02148
MD5 84f837b4dd2602b0cddd0f1e81d84dbb
BLAKE2b-256 d07183445936e7a775b035d101c6dce28e8c3364d5f23f464af5145ea8dcc2ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page