Skip to main content

This package identifies outlier(s) for a given time-series dataset in simple steps. It supports day, week, month and quarter level time-series data.

Project description

PyCatcher

License PyPI Downloads PyPI Downloads PyPI Downloads PYPI version PYPI - Python Version

Outlier Detection for Time-series Data

This package identifies outlier(s) for a given time-series dataset in simple steps. It supports day, week, month and quarter level time-series data.

Installation

pip install pycatcher

Basic Requirements

  • PyCatcher expects a Pandas DataFrame as an input for various outlier detection methods. It can convert Spark DataFrame to Pandas DataFrame at the data processing stage.
  • First column in the dataframe must be a time period column (date in 'YYYY-MM-DD'; month in 'YYYY-MM'; year in 'YYYY' format) and the last column a numeric column (sum or total count for the time period) to detect outliers using Seasonal Decomposition algorithms.
  • Last column must be a numeric column to detect outliers using Interquartile Range (IQR) and Moving Average algorithms.
  • At present, PyCatcher does not depend on labeled observations (ground truth). Outliers are detected solely through underlying algorithms (for example, seasonal-trend decomposition and dispersion methods like MAD or Z-Score).

Summary of features

PyCatcher provides an efficient solution for detecting anomalies in time-series data using various statistical methods. Below are the available techniques for anomaly detection, each optimized for different data characteristics.

1. Seasonal-Decomposition Based Anomaly Detection

Seasonal decomposition algorithms (Classical; STL; MSTL) requires at least 2 years of data, otherwise we can use simpler methods (Inter Quartile Range (IQR); Moving Average method) to detect outliers.

Detect Outliers Using Classical Seasonal Decomposition

For datasets with at least two years of data, PyCatcher automatically determines whether the data follows an additive or multiplicative model to detect anomalies.

  • Method: detect_outliers_classic(df)
  • Output: DataFrame of detected anomalies or a message indicating no anomalies.

Detect Today's Outliers

Quickly identify if there are any anomalies specifically for the current date.

  • Method: detect_outliers_today_classic(df)
  • Output: Anomaly details for today or a message indicating no outliers.

Detect the Latest Anomalies

Retrieve the most recent anomalies identified in your time-series data.

  • Method: detect_outliers_latest_classic(df)
  • Output: Details of the latest detected anomalies.

Visualize Outliers with Seasonal Decomposition

Show outliers in your data through classical seasonal decomposition.

  • Method: build_outliers_plot_classic(df)
  • Output: Outlier plot generated using classical seasonal decomposition.

Visualize Seasonal Decomposition

Understand seasonality in your data by visualizing classical seasonal decomposition.

  • Method: build_seasonal_plot_classic(df)
  • Output: Seasonal plots displaying additive or multiplicative trends.

Visualize Monthly Patterns

Show month-wise box plot

  • Method: build_monthwise_plot(df)
  • Output: Month-wise box plots showing spread and skewness of data.

Detect Outliers Using Seasonal-Trend Decomposition using LOESS (STL)

Use the Seasonal-Trend Decomposition method (STL) to detect anomalies.

  • Method: detect_outliers_stl(df)
  • Output: Rows flagged as outliers using STL.

Detect Today's Outliers

Quickly identify if there are any anomalies specifically for the current date.

  • Method: detect_outliers_today_stl(df)
  • Output: Anomaly details for today or a message indicating no outliers.

Detect the Latest Anomalies

Retrieve the most recent anomalies identified in your time-series data.

  • Method: detect_outliers_latest_stl(df)
  • Output: Details of the latest detected anomalies.

Visualize STL Outliers

Show outliers using the Seasonal-Trend Decomposition using LOESS (STL).

  • Method: build_outliers_plot_stl(df)
  • Output: Outlier plot generated using STL.

Visualize Seasonal Decomposition using STL

Understand seasonality in your data by visualizing Seasonal-Trend Decomposition using LOESS (STL).

  • Method: build_seasonal_plot_stl(df)
  • Output: Seasonal plot to decompose a time series into a trend component, seasonal components, and a residual component.

Detect Outliers Using Multiple Seasonal-Trend decomposition using LOESS (MSTL)

Use the Multiple Seasonal-Trend Decomposition method (MSTL) to detect anomalies.

  • Method: detect_outliers_mstl(df)
  • Output: Rows flagged as outliers using MSTL.

Detect Today's Outliers

Quickly identify if there are any anomalies specifically for the current date.

  • Method: detect_outliers_today_mstl(df)
  • Output: Anomaly details for today or a message indicating no outliers.

Detect the Latest Anomalies

Retrieve the most recent anomalies identified in your time-series data.

  • Method: detect_outliers_latest_mstl(df)
  • Output: Details of the latest detected anomalies.

Visualize MSTL Outliers

Show outliers using the Multiple Seasonal-Trend Decomposition using LOESS (MSTL).

  • Method: build_outliers_plot_mstl(df)
  • Output: Outlier plot generated using MSTL.

Visualize Multiple Seasonal Decomposition

Understand seasonality in your data by visualizing Multiple Seasonal-Trend Decomposition using LOESS (MSTL).

  • Method: build_seasonal_plot_mstl(df)
  • Output: Seasonal plot to decompose a time series into a trend component, multiple seasonal components, and a residual component.

2. Detect Outliers Using ESD (Extreme Studentized Deviate)

Detect anomalies in time-series data using the ESD algorithm.

  • Method: detect_outliers_esd(df)
  • Output: Rows flagged as outliers using the Generalized ESD or Seasonal ESD algorithm.

Visualize ESD Outliers

Show outliers using the Generalized ESD or Seasonal ESD algorithm.

  • Method: build_outliers_plot_esd(df)
  • Output: Outlier plot generated using Generalized ESD or Seasonal ESD algorithm.

3. Detect Outliers Using Moving Average

Detect anomalies in time-series data using the Moving Average method.

  • Method: detect_outliers_moving_average(df)
  • Output: Rows flagged as outliers using Moving Average and Z-score algorithm.

Visualize Moving Average Outliers

Show outliers using the Moving Average and Z-score algorithm.

  • Method: build_outliers_plot_moving_average(df)
  • Output: Outlier plot generated using Moving Average method.

4. IQR-Based Anomaly Detection

Detect Outliers Using Interquartile Range (IQR)

For datasets spanning less than two years, the IQR method is employed.

  • Method: find_outliers_iqr(df)
  • Output: Rows flagged as outliers based on IQR.

Visualize IQR Plot

Build an IQR plot for a given dataframe (for less than 2 years of data).

  • Method: build_iqr_plot(df)
  • Output: IQR plot for the time-series data.

Example Usage

To see an example of how to use the pycatcher package for outlier detection in time-series data, check out the Example Notebook.

The notebook provides step-by-step guidance and demonstrates the key features of the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycatcher-0.0.72.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycatcher-0.0.72-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file pycatcher-0.0.72.tar.gz.

File metadata

  • Download URL: pycatcher-0.0.72.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for pycatcher-0.0.72.tar.gz
Algorithm Hash digest
SHA256 df28c033061e0677d0177c276bb19e1711755da16f9a33214572dbdc7887c340
MD5 c17a81f14e3136298efcdb1fb45ded14
BLAKE2b-256 d7e02013ad6ad2dd0a7a41bc7b679c4749ac389fbf3ba5d9e7fa6709f419605e

See more details on using hashes here.

File details

Details for the file pycatcher-0.0.72-py3-none-any.whl.

File metadata

  • Download URL: pycatcher-0.0.72-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for pycatcher-0.0.72-py3-none-any.whl
Algorithm Hash digest
SHA256 fdddd542956a6ec30588383dc402df788f289dc50b73b36cb34285e67038b981
MD5 1f592a83469676e0e6c2f8069d95ed7d
BLAKE2b-256 c3fd9b9b59fec38a99ae0b1e6c6892448347440ecfb15b1db375ed7fdddf8923

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page