Skip to main content

Robust decomposition and anomaly detection on multiple time series for any SQL backend. Designed for traffic data.

Project description

Traffic Anomaly

traffic-anomaly is a production ready Python package for robust decomposition and anomaly detection on multiple time series at once. It uses Ibis to integrate with any SQL backend in a production pipeline, or run locally with the included DuckDB backend.

Designed for real world messy traffic data (volumes, travel times), traffic-anomaly uses medians to decompose time series into trend, daily, weekly, and residual components. Anomalies are then classified, and Median Absolute Deviation may be used for further robustness. Missing data are handled, and time periods without sufficient data can be thrown out. Try it out, sample data included! Open In Colab

Installation & Usage

pip install traffic-anomaly
import traffic_anomaly
from traffic_anomaly import sample_data

decomp = traffic_anomaly.median_decompose(
    data=travel_times, # Pandas DataFrame or Ibis Table (for compatibility with any SQL backend)
    datetime_column='timestamp',
    value_column='travel_time',
    entity_grouping_columns=['id', 'group'],
    freq_minutes=60, # Frequency of the time series in minutes
    rolling_window_days=7, # Rolling window size in days. Should be a multiple of 7 for traffic data
    drop_days=7, # Should be at least 7 for traffic data
    min_rolling_window_samples=56, # Minimum number of samples in the rolling window, set to 0 to disable.
    min_time_of_day_samples=7, # Minimum number of samples for each time of day (like 2:00pm), set to 0 to disable
    drop_extras=False, # lets keep seasonal/trend for visualization below
    to_sql=False # Return SQL queries instead of Pandas DataFrames for running on SQL backends
)
decomp.head(3)
id timestamp travel_time group median season_day season_week resid prediction
448838574 2022-09-29 06:00:00 24.8850 SE SUNNYSIDE RD 24.963749 -4.209375 0.57875 3.5518772 21.333122
448838574 2022-09-22 06:00:00 20.1600 SE SUNNYSIDE RD 24.842501 -4.209375 0.57875 -1.0518752 21.211876
448838574 2022-09-15 06:00:00 22.2925 SE SUNNYSIDE RD 24.871250 -4.209375 0.57875 1.0518752 21.240623
# Apply anomaly detection
anomaly = traffic_anomaly.find_anomaly(
    decomposed_data=decomp, # Decomposed time series as a Pandas DataFrame or Ibis Table
    datetime_column='timestamp',
    value_column='travel_time',
    entity_grouping_columns=['id'],
    entity_threshold=3.5 # Threshold for entity-level anomaly detection (z-score or GEH statistic)
)
anomaly.head(3)
id timestamp travel_time group prediction anomaly
448838575 2022-09-09 06:00:00 19.3575 SE SUNNYSIDE RD 16.926249 False
448838575 2022-09-09 07:00:00 22.5200 SE SUNNYSIDE RD 20.826252 False
448838575 2022-09-09 08:00:00 23.0350 SE SUNNYSIDE RD 22.712502 False

The image below is showing an example application on actual traffic counts. Note that this package does not produce plots.

ExampleAnomaly

Here's a plot showing what it looks like to decompose a time series. The sum of compoenents is equal to the original data. After extracting the trend and seasonal components, what is left are residuals that are more stationary so they're easier to work with. Example

Considerations

The seasonal components are not allowed to change over time, therefore, it is important to limit the number of weeks included in the model, especially if there is yearly seasonality (and there is). The recommended use for application over a long date range is to run the model incrementally over a rolling window of about 6 weeks.

Because traffic data anomalies usually skew higher, forecasts made by this model are systemically low because in a right tailed distribution the median will be lower than the mean. This is by design, as the model is meant primarily for anomaly detection and not forecasting.

Notes On Anomaly Detection

traffic_anomaly can classify two separate types of anomalies:

  1. Entity-Level Anomalies are detected for individual entities based on their own historical patterns, without considering the group context.
  2. Group-Level Anomalies are detected for entities when compared to the behavior of other entities within the same group. Group-level anomalies are more rare because in order to be considered for classification as a group-level anomaly, a time period must also have been classified as an entity-level anomaly.

Why is that needed? Well, say you're data is vehicle travel times within a city and there is a snow storm. Travel times across the city drop, and if you're looking at roadway segments in isolation, everything is an anomaly. That's nice, but what if you're only interested in things that are broken? That's where group-level anomalies come in. They are more rare, but they are more likely to be actionable. Probably not much you can do about that snow storm...

Future Plans/Support

It would be nice to add support for Holidays and a yearly component... please help?

Change Point Detection

I have working code from the ruptures package but it's not integrated here yet, and it's slower than molasses. I'll get to it eventually.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traffic_anomaly-1.0.4.tar.gz (139.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

traffic_anomaly-1.0.4-py3-none-any.whl (138.2 kB view details)

Uploaded Python 3

File details

Details for the file traffic_anomaly-1.0.4.tar.gz.

File metadata

  • Download URL: traffic_anomaly-1.0.4.tar.gz
  • Upload date:
  • Size: 139.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.1

File hashes

Hashes for traffic_anomaly-1.0.4.tar.gz
Algorithm Hash digest
SHA256 f7738e0110099aeb23c9b202d056a99042157bf3af76c73a0c068c3831cbd93d
MD5 0fffd03bb0c342b8331df41134ff15b7
BLAKE2b-256 0c422d47e998db0a1c34dd335b1f1d2fe38b2e4703022663b537c7aa6991eaf3

See more details on using hashes here.

File details

Details for the file traffic_anomaly-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for traffic_anomaly-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3e2cbce7d25b5577f15efc9b86ebfaac932ebeb1b5081f779cd96234dbbd46cd
MD5 5c654d8c2045fea898deef8b14ac2d68
BLAKE2b-256 eba13d1b20ac0056408e52c54801d4b61f4280b4c89ded5810249108010b2b00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page