Ensemble framework for detecting outliers in grouped time-series data

These details have not been verified by PyPI

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

anomaly-pipeline

anomaly-pipeline is an ensemble framework for detecting outliers in grouped time-series data. It automates the entire workflow from data cleaning and calendar interpolation to running 8 different detection algorithms and generating visual diagnostic reports.

Key Capabilities

Ensemble Scoring: Combines 8 models (Statistical + ML) to provide a robust Anomaly_Score and a final is_Anomaly consensus.
Hierarchical Processing: Natively handles grouped data (e.g., detecting anomalies per Region, Product, or Channel).
Automated Preprocessing: Handles missing dates via linear interpolation and filters out "low-quality" unique_ids automatically.
Parallel Execution: Leverages joblib for multi-core processing of large datasets.
Visual Analytics: Generates pie charts, stacked bar plots, and detailed group-level time-series breakdowns.

Included Models

The pipeline utilizes an ensemble of the following methodologies:

Statistical: Percentile (5th/95th), Standard Deviation (SD), Median Absolute Deviation (MAD), and Interquartile Range (IQR).
Time-Series Specific: EWMA (Exponentially Weighted Moving Average) and FB Prophet (Walk-forward validation).
Machine Learning: Isolation Forest (General & Time-series optimized) and DBSCAN.

Detailed Functionality

Robust Input Validation: Clear error messaging for missing parameters or incorrect data types.
Quality Control: Automatically generates a Success Report
Visual Suite: Automated rendering of Pie Charts (Summary), Stacked Bars (Distribution), and Top-5 Anomaly Heatmaps.

🚀 Quick Start

!pip install anomaly-pipeline
import pandas as pd
from anomaly_pipeline import timeseries_anomaly_detection

 # Load your data
df = pd.read_csv("your_data.csv")

 # Run the pipeline
anomaly_df, success_report, exclusion_report = timeseries_anomaly_detection( master_data=df,
                                                                             unique_ids=['category', 'region'],
                                                                             variable='sales',
                                                                             date_column='timestamp',
                                                                             freq='W-MON',
                                                                             eval_period=1  # Evaluate the most recent recor
                                                                             )

📊 Visualizing Results & Deep Dives

Inspecting a Specific Group, if a specific group shows a high anomaly rate, use the evaluation_info tool to render detailed diagnostic plots.

from anomaly_pipeline import evaluation_info

# Filter the specific group you want to inspect. Define the group values (must match the order in unique_ids)
group_values = ['appliances', 'TX'] 

# Filter the results for this group
mask = anomaly_df[unique_ids].eq(group_values).all(axis=1)
group_df = anomaly_df[mask]

# Generate detailed diagnostic plots
evaluation_info(group_df,
                unique_ids,
                variable,
                date_column,
                eval_period=1
                )

The Evaluation Dashboard provides:

Model Breakdown: Individual charts for FB Prophet, EWMA, and Isolation Forest with confidence intervals.
Ensemble View: A summary highlighting where multiple models overlap.
Statistical Thresholds: Visual markers for IQR, MAD, percentile and SD limits.

Input_data:

Mandatory

master_data (pd.DataFrame) : Name of your dataframe containing inputs to be evaluated for anomalies include variables, dates, and unique_ids.
unique_ids (list[str]) : List of columns used to segment data ['SKU', 'channel', "store_id"].
variable (str) : The numerical target column name to analyze for presence of anomalies.
date_column (str) : The datetime column representing the time dimension ["date","week","month"].

Default

freq (str) : Frequency of the date column. Default: 'W-MON'. Accepts 'D', or 'MS'.
eval_period (int) : Number of trailing records or periods to evaluate for anomalies. Default: 1.
max_records (int) : Max history to consider starting from the most recent date. Default: all history
imputation_method : Technique to fill missing time units. Default: 'linear'. Acceptable values are : 'mean', 'mode', 'zero', 'linear'
mad_threshold (int) : MAD parameter, controls Median Absolute Deviation sensitivity. Default: 2.
mad_scale_factor (int) : MAD parameter, The scaling constant used to normalize the MAD. Default: 0.6745.
alpha (float) : EWMA parameter, controls the smoothing factor for EWMA trend. Default: 0.3.
sigma (float) : EWMA parameter, determines the standard deviation multiplier for upper and lower bounds. Default: 1.5.
prophet_CI (float) : Prophet parameter, determines the confidence interval. Range 0 to 1, Default: 0.9.
contamination (float) : Isolation Forest parameter, expected % of outliers (0 to 0.5). Default: 0.03.
random_state (int) : Seed for model reproducibility. Default: 42.

📤 RETURNS

tuple [pd.DataFrame, pd.DataFrame, pd.DataFrame]:

final_results : The main output, a dataframe that identifies anomalies with Anomaly_Votes and is_Anomaly.
evaluation_report : Summary of interpolation %, record counts, and anomaly rates.

Output columns of final_results : All the output values are at "unique_ids" level.

MIN_value The minimum historical "variable" values. For train data the value is fixed. For test data varies. It is the min_value up to t-1.

MAX_value The maximum historical "variable" values. For train data the value is fixed. For test data varies. It is the max_value up to t-1.

Percentile_low / Percentile_high The 5th and 95th percentile "variable" values Used to detect unusually low or unusually high "variable" values. Fixed for train data. Varies for test data. Takes the stats by considering historical data upto t-1.

Percentile_anomaly Flags based on percentile limits: • Low → value < Percentile_low • High → value > Percentile_high • None → within the range

Mean / SD (Standard Deviation) The average "variable"and its standard deviation based on historical data.Fixed for train data. Varies for test data. Takes the stats by considering historical data upto t-1.

SD2_low / SD2_high Two-standard-deviation control limits: • SD2_low = mean − 2×SD (floored at 0) • SD2_high = mean + 2×SD

SD_anomaly Flags based on SD2 limits: • Low → value < SD2_low • High → value > SD2_high • None → within the range

Median / MAD (Median Absolute Deviation) Median of "variable" and the median of absolute deviations from the median.Fixed for train data. Varies for test data. Takes the stats by considering historical data upto t-1. Used for robust anomaly detection when data contains outliers.

MAD_low / MAD_high MAD-based limits: • MAD_low = median − 2 × MAD / 0.6745 (floored at 0) • MAD_high = median + 2 × MAD / 0.6745

MAD_anomaly Flags based on MAD limits: • Low → value < MAD_low • High → value > MAD_high • None → within the range

Q1 / Q3 / IQR (Interquartile Range) • Q1: 25th percentile • Q3: 75th percentile • IQR = Q3 − Q1 Used to detect unusually low or high "variable" values.

IQR_low / IQR_high IQR-based limits: • IQR_low = Q1 − 1.5 × IQR (floored at 0) • IQR_high = Q3 + 1.5 × IQR

IQR_anomaly Flags based on IQR limits: • Low → value < IQR_low • High → value > IQR_high • None → within the range

is_Percentile_anomaly / is_SD_anomaly / is_MAD_anomaly / is_IQR_anomaly Boolean indicators stating whether each method classified the value as an anomaly (low or high).

Alpha Smoothing factor used in EWMA. Higher values give more weight to recent observations.

EWMA_forecast Expected value estimated using the EWMA model.

EWMA_STD Rolling standard deviation of residuals around the EWMA forecast.

EWMA_high Upper anomaly threshold (EWMA_forecast + sigma × EWMA_STD).

EWMA_low lower anomaly threshold (EWMA_forecast - sigma × EWMA_STD).

Is_EWMA_anomaly Boolean flag indicating whether the observed value falls outside the EWMA bounds.

FB_forecast Expected value estimated using the EWMA model.

FB_low Lower confidence interval of the Prophet forecast

FB_high Upper confidence interval of the Prophet forecast.

FB_residual Difference between observed value and Prophet forecast.

FB_anomaly Raw anomaly indicator based on Prophet confidence bounds.

Is_FB_anomaly Boolean flag indicating a Prophet-detected anomaly.

isolation_forest_score Score from the Isolation Forest model indicating anomaly severity. Typical range: –0.5 to +0.5 • Higher scores = more normal • Lower scores = more anomalous

is_IsoForest_anomaly Boolean flag based on Isolation Forest model output: • True → model predicts anomaly (prediction = –1) • False → model predicts normal (prediction = 1)

dbscan_score Cluster label or distance score produced by DBSCAN (-1 indicates noise/anomaly).

is_DBSCAN_anomaly Boolean flag indicating DBSCAN-detected anomaly.

Anomaly_Votes Count of anomaly-detection methods that agree a point is anomalous. Ranges from 0 to 8.

is_Anomaly Final ensemble decision: • True → value flagged anomalous by 4 or more methods • False → fewer than 4 methods indicate anomaly

Project details

These details have not been verified by PyPI

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.1.105

May 5, 2026

0.1.103

Apr 6, 2026

0.1.102

Mar 27, 2026

This version

0.1.99

Feb 9, 2026

0.1.92

Feb 6, 2026

0.1.77

Jan 27, 2026

0.1.75

Jan 21, 2026

0.1.68

Jan 19, 2026

0.1.67

Jan 19, 2026

0.1.62

Jan 15, 2026

0.1.61

Jan 15, 2026

0.1.60

Jan 14, 2026

0.1.57

Jan 14, 2026

0.1.55

Jan 13, 2026

0.1.53

Jan 13, 2026

0.1.51

Jan 13, 2026

0.1.50

Jan 13, 2026

0.1.42

Jan 9, 2026

0.1.40

Jan 9, 2026

0.1.33

Jan 6, 2026

0.1.32

Jan 6, 2026

0.1.27

Jan 5, 2026

0.1.2

Jan 6, 2026

0.1.1

Dec 17, 2025

0.1.0

Dec 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anomaly_pipeline-0.1.99.tar.gz (70.6 kB view details)

Uploaded Feb 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anomaly_pipeline-0.1.99-py3-none-any.whl (76.8 kB view details)

Uploaded Feb 9, 2026 Python 3

File details

Details for the file anomaly_pipeline-0.1.99.tar.gz.

File metadata

Download URL: anomaly_pipeline-0.1.99.tar.gz
Upload date: Feb 9, 2026
Size: 70.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for anomaly_pipeline-0.1.99.tar.gz
Algorithm	Hash digest
SHA256	`218537a142475735df73227e6fcf70ed022ad281c7175fa2883d903345b25fd5`
MD5	`bf601d678c4b62eaee80ef69db22a56d`
BLAKE2b-256	`ae16506f6720c37d3826fc6aa6d74b9fc14901e418f162350a13d790747375ee`

See more details on using hashes here.

File details

Details for the file anomaly_pipeline-0.1.99-py3-none-any.whl.

File metadata

Download URL: anomaly_pipeline-0.1.99-py3-none-any.whl
Upload date: Feb 9, 2026
Size: 76.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for anomaly_pipeline-0.1.99-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db7463cbafb6871a943b7be6a70359a0a51d7eae67978793e9232e94a7fa8161`
MD5	`485aacdfcc37f3f2b09993fada6d2805`
BLAKE2b-256	`9dbaaedf8f011b6d95992c6a982e945a17b6808b32b88e61e2ae2f3a2b495503`

See more details on using hashes here.

anomaly-pipeline 0.1.99

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

anomaly-pipeline

Key Capabilities

Included Models

Detailed Functionality

🚀 Quick Start

📊 Visualizing Results & Deep Dives

Input_data:

Mandatory

Default

📤 RETURNS

Output columns of final_results : All the output values are at "unique_ids" level.

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes