Robust decomposition, anomaly and changepoint detection on multiple time series for any SQL backend. Designed for traffic data.
Project description
Traffic Anomaly
traffic-anomaly is a production ready Python package for robust decomposition, anomaly detection, and change point detection on multiple time series at once. It uses Ibis to integrate with any SQL backend in a production pipeline, or run locally with the included DuckDB backend.
Tested on: Windows, macOS, and Ubuntu with Python 3.9-3.13
Designed for real world messy traffic data (volumes, travel times), traffic-anomaly uses medians to decompose time series into trend, daily, weekly, and residual components. Anomalies are then classified using Z-score or GEH statistics, and change points identify structural shifts in the data. Median Absolute Deviation may be used for further robustness. Missing data are handled, and time periods without sufficient data can be thrown out. Try it out, sample data included!
Installation & Usage
pip install traffic_anomaly
from traffic_anomaly import *
from traffic_anomaly import sample_data
# Load sample data
travel_times = sample_data.travel_times
decomp = decompose(
data=travel_times, # Pandas DataFrame or Ibis Table (for compatibility with any SQL backend)
datetime_column='timestamp',
value_column='travel_time',
entity_grouping_columns=['id', 'group'],
freq_minutes=60, # Frequency of the time series in minutes
rolling_window_days=7, # Rolling window size in days. Should be a multiple of 7 for traffic data
drop_days=7, # Should be at least 7 for traffic data
min_rolling_window_samples=56, # Minimum number of samples in the rolling window, set to 0 to disable.
min_time_of_day_samples=7, # Minimum number of samples for each time of day (like 2:00pm), set to 0 to disable
drop_extras=False, # lets keep seasonal/trend for visualization below
to_sql=False # Return SQL queries instead of Pandas DataFrames for running on SQL backends
)
decomp.head(3)
| id | timestamp | travel_time | group | median | season_day | season_week | resid | prediction |
|---|---|---|---|---|---|---|---|---|
| 448838574 | 2022-09-29 06:00:00 | 24.8850 | SE SUNNYSIDE RD | 24.963749 | -4.209375 | 0.57875 | 3.5518772 | 21.333122 |
| 448838574 | 2022-09-22 06:00:00 | 20.1600 | SE SUNNYSIDE RD | 24.842501 | -4.209375 | 0.57875 | -1.0518752 | 21.211876 |
| 448838574 | 2022-09-15 06:00:00 | 22.2925 | SE SUNNYSIDE RD | 24.871250 | -4.209375 | 0.57875 | 1.0518752 | 21.240623 |
Here's a plot showing what it looks like to decompose a time series. The sum of components is equal to the original data. After extracting the trend and seasonal components, what is left are residuals that are more stationary so they're easier to work with.
# Apply anomaly detection
anomaly = traffic_anomaly.anomaly(
decomposed_data=decomp, # Decomposed time series as a Pandas DataFrame or Ibis Table
datetime_column='timestamp',
value_column='travel_time',
entity_grouping_columns=['id'],
entity_threshold=3.5 # Threshold for entity-level anomaly detection (z-score or GEH statistic)
)
anomaly.head(3)
| id | timestamp | travel_time | group | prediction | anomaly |
|---|---|---|---|---|---|
| 448838575 | 2022-09-09 06:00:00 | 19.3575 | SE SUNNYSIDE RD | 16.926249 | False |
| 448838575 | 2022-09-09 07:00:00 | 22.5200 | SE SUNNYSIDE RD | 20.826252 | False |
| 448838575 | 2022-09-09 08:00:00 | 23.0350 | SE SUNNYSIDE RD | 22.712502 | False |
The image below is showing an example application on actual traffic counts. Note that this package does not produce plots.
Changepoint Detection
traffic_anomaly includes robust changepoint detection that identifies significant changes, such as when traffic patterns shift due to construction, equipment failure, or events like school starting up in the Fall. Changepoints represent moments when the underlying statistical properties of the data change. This functionality is meant for detecting long term / persistent changes, whereas anomaly detection is for short term / transient changes.
# Load changepoint sample data
changepoint_data = sample_data.changepoints_input
# Apply change point detection
changepoints = traffic_anomaly.changepoint(
data=changepoint_data, # Pandas DataFrame or Ibis Table
value_column='travel_time_seconds',
entity_grouping_column='ID',
datetime_column='TimeStamp',
rolling_window_days=14, # Size of analysis window
robust=True, # Use robust (Winsorized) variance for better outlier handling
score_threshold=5, # Threshold for change point detection (lower = more sensitive)
min_separation_days=3 # Minimum days between detected change points
)
changepoints.head(3)
| ID | TimeStamp | score | avg_before | avg_after | avg_diff |
|---|---|---|---|---|---|
| 448838574 | 2022-09-15 14:00:00 | 2.34 | 45.2 | 52.8 | 7.6 |
| 448838575 | 2022-09-22 08:00:00 | 1.89 | 38.1 | 29.4 | -8.7 |
| 448838576 | 2022-10-01 16:00:00 | 3.12 | 41.5 | 48.9 | 7.4 |
The image below shows an example of changepoint detection on traffic data, highlighting where significant structural changes occur in the time series.
The change point detection algorithm:
- Uses variance-based scoring to identify periods where data patterns shift
- Can operate in robust mode (recommended) which uses Winsorized variance for better handling of outliers
- Provides before/after averages to quantify the magnitude and direction of changes
- Filters results to local peaks with minimum separation to avoid detecting noise
Parameters
robust=True: Uses Winsorized variance (clips extreme values) for more stable detectionscore_threshold: Higher values detect fewer, more significant change pointsrolling_window_days: Size of the analysis window (split between before/after periods)min_separation_days: Prevents detecting multiple change points too close together
Considerations
The seasonal components are not allowed to change over time, therefore, it is important to limit the number of weeks included in the model, especially if there is yearly seasonality (and there is). The recommended use for application over a long date range is to run the model incrementally over a rolling window of about 6 weeks.
Because traffic data anomalies usually skew higher, forecasts made by this model are systemically low because in a right tailed distribution the median will be lower than the mean. This is by design, as the model is meant primarily for anomaly detection and not forecasting.
Notes On Anomaly Detection
traffic_anomaly can classify two separate types of anomalies:
- Entity-Level Anomalies are detected for individual entities based on their own historical patterns, without considering the group context.
- Group-Level Anomalies are detected for entities when compared to the behavior of other entities within the same group. Group-level anomalies are more rare because in order to be considered for classification as a group-level anomaly, a time period must also have been classified as an entity-level anomaly.
Why is that needed? Well, say you're data is vehicle travel times within a city and there is a snow storm. Travel times across the city drop, and if you're looking at roadway segments in isolation, everything is an anomaly. That's nice, but what if you're only interested in things that are broken? That's where group-level anomalies come in. They are more rare, but they are more likely to be actionable. Probably not much you can do about that snow storm...
Future Plans/Support
Potentially support Holidays and add a yearly component. Additional changes are not likely unless there is a specific need. Please open an issue if you have a feature request or find a bug.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file traffic_anomaly-2.0.0.tar.gz.
File metadata
- Download URL: traffic_anomaly-2.0.0.tar.gz
- Upload date:
- Size: 206.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb808364ef045318c314cd464e3f77cd6ad9c1b00cce757ca4b9ca664bb484cd
|
|
| MD5 |
6e42aa284ccb14122611fb6c63a045e9
|
|
| BLAKE2b-256 |
253835a75858e71b12aa36cf96e47ee9aacea4bf773ee1cc7ee34cd1dbdeb1ec
|
File details
Details for the file traffic_anomaly-2.0.0-py3-none-any.whl.
File metadata
- Download URL: traffic_anomaly-2.0.0-py3-none-any.whl
- Upload date:
- Size: 198.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0c1c462c8cdb8ddb2a93bc25c9b0f641f7a054534de293affc38d8514b89f82
|
|
| MD5 |
42b64b2178c3b14ebf0b088e8f213481
|
|
| BLAKE2b-256 |
1158ba3a65947ad3ff0f621f2f026225bafaa9e11ca60abad366c3284f835504
|