Skip to main content

Generate ATSPM anomaly detection reports with CUSUM analysis and PDF output.

Project description

ATSPM Report Package

Unit Tests PyPI version codecov Python versions License: MIT Downloads

A Python package for generating daily reports for new traffic signal issues. The generated report highlights new issues that just occurred, and filters out previously flagged issue.

Example Report

Features & Alert Types

This tool uses aggregate data produced by the atspm Python package to identify 6 key types of traffic signal performance issues.

  • Multi-region reporting: Automatically generates separate PDF reports for each region.
  • Alert suppression: Configurable alert retention to prevent duplicate alerts.
  • Custom branding: Support for custom logos in generated PDFs.
  • Date-based jokes: Rotating collection of jokes in reports based on current date.

1. Max-Out Alerts

Detects increased percent max-out compared to historical baseline.

Example Max-Out Alert

2. Actuation Alerts

Detects worsening detector performance compared to historical baseline.

Example Detector Alert

3. Pedestrian Alerts

Detects significant decreases in pedestrian services or anomalous actuations per service ratio compared to historical baseline.

Example Pedestrian Alert

4. Missing Data Alerts

Detects when signals are offline or missing data more than usual.

5. Phase Skip Alerts

Detects when phase wait times (without preempt present) are more than 1.5x the cycle length, indicating a skipped phase.

Example Phase Skip Alert

6. System Outage Alerts

Detects system-wide outage or data loss.

Installation

pip install atspm-report

Quick Start

The ReportGenerator is the main entry point. It accepts configuration options and a set of DataFrames (pandas or Ibis) to generate PDF reports.

import pandas as pd
from pathlib import Path
from atspm_report import ReportGenerator

# 1. Configure the generator
config = {
    'verbosity': 1,
    'alert_suppression_days': 14,
    'alert_retention_weeks': 3,
}

# 2. Load your data
# See "Input Data Schemas" below for required columns
test_data_dir = Path('tests/data')
signals = pd.read_parquet(test_data_dir / 'signals.parquet')
terminations = pd.read_parquet(test_data_dir / 'terminations.parquet')
detector_health = pd.read_parquet(test_data_dir / 'detector_health.parquet')
has_data = pd.read_parquet(test_data_dir / 'has_data.parquet')
pedestrian = pd.read_parquet(test_data_dir / 'full_ped.parquet')

# 2b. Load phase wait and coordination data (for phase skip detection)
# These come from the atspm package's phase_wait and coordination_agg aggregations
phase_wait = pd.read_parquet(test_data_dir / 'phase_wait.parquet')  # Optional
coordination_agg = pd.read_parquet(test_data_dir / 'coordination_agg.parquet')  # Optional

# 3. Load past alerts for suppression (optional but recommended)
past_alerts = {}
for alert_type in ['maxout', 'actuations', 'missing_data', 'pedestrian', 'phase_skips', 'system_outages']:
    file_path = Path(f'past_{alert_type}_alerts.parquet')
    past_alerts[alert_type] = pd.read_parquet(file_path) if file_path.exists() else pd.DataFrame()

# 4. Generate reports
generator = ReportGenerator(config)
result = generator.generate(
    signals=signals,
    terminations=terminations,
    detector_health=detector_health,
    has_data=has_data,
    pedestrian=pedestrian,
    phase_wait=phase_wait,  # For phase skip detection
    coordination_agg=coordination_agg,  # For cycle length visualization
    past_alerts=past_alerts
)

# 5. Save PDF reports
for region, pdf_bytes in result['reports'].items():
    with open(f'report_{region}.pdf', 'wb') as f:
        pdf_bytes.seek(0)
        f.write(pdf_bytes.read())
    print(f"Generated report for {region}")

# 6. Save updated alert history for next run
for alert_type, df in result['updated_past_alerts'].items():
    if not df.empty:
        df.to_parquet(f'past_{alert_type}_alerts.parquet', index=False)

# 7. Access alerts directly if needed
for alert_type, alerts_df in result['alerts'].items():
    if not alerts_df.empty:
        print(f"{alert_type}: {len(alerts_df)} alerts")

Using Ibis for Large Datasets

For large datasets, you can pass Ibis tables instead of pandas DataFrames. This enables lazy evaluation and support for backends like DuckDB, Polars, and Spark.

import ibis
from atspm_report import ReportGenerator

con = ibis.duckdb.connect()
signals = con.read_parquet('signals.parquet')
# ... load other tables ...

generator = ReportGenerator({'verbosity': 1})
result = generator.generate(
    signals=signals,
    # ... pass other ibis tables ...
)

Configuration Options

Pass these keys in the config dictionary to ReportGenerator.

Option Type Default Description
custom_logo_path str or None None Path to custom logo image (PNG/JPG). If None, uses default ODOT logo
verbosity int 1 Output verbosity: 0=silent, 1=info, 2=debug
alert_suppression_days int 21 Days to suppress repeat alerts for same signal/issue
alert_retention_weeks int 104 Weeks to retain past alerts before cleanup
historical_window_days int 21 Days of historical data to analyze
alert_flagging_days int 7 Maximum age (days) for new alerts to be flagged
suppress_repeated_alerts bool True Enable alert suppression logic
figures_per_device int 3 Number of plots per device in reports
phase_skip_alert_threshold int 1 Minimum skips to trigger phase skip alert
phase_skip_retention_days int 14 Days to retain phase skip data
joke_index int or None None Specific joke index (0-based). If None, auto-cycles by date

Input Data Schemas

The generate() method accepts pandas DataFrames or Ibis tables.

signals (Required)

Signal metadata including location and regional assignment.

Column Type Description Example
DeviceId str or int Unique signal identifier (converted to string internally) signal_1 or 12345
Name str Signal location name 04100-Pacific at Hill
Region str Geographic region assignment Region 2

Sample:

signals = pd.DataFrame({
    'DeviceId': ['signal_1', 'signal_2'],
    'Name': ['04100-Pacific at Hill', '2B528-(OR8) Adair St @ 4th Av'],
    'Region': ['Region 2', 'Region 1']
})
terminations (Optional)

Phase termination data for detecting max-out conditions.

Column Type Description Example
TimeStamp datetime Event timestamp 2024-01-15 08:30:00
DeviceId str or int Signal identifier (converted to string internally) signal_1 or 12345
Phase int Phase number (1-8) 2
PerformanceMeasure str Termination type MaxOut, ForceOff, GapOut
Total int Number of occurrences 45

Sample:

terminations = pd.DataFrame({
    'TimeStamp': pd.to_datetime(['2024-01-15 08:30:00', '2024-01-15 08:35:00', '2024-01-15 08:35:00']),
    'DeviceId': ['signal_1'] * 3,
    'Phase': [2, 2, 4],
    'PerformanceMeasure': ['MaxOut', 'GapOut', 'ForceOff'],
    'Total': [30, 15, 12]
})
detector_health (Optional)

Detector actuation counts for health monitoring.

Column Type Description Example
TimeStamp datetime Event timestamp 2024-01-15 00:00:00
DeviceId str or int Signal identifier (converted to string internally) signal_1 or 12345
Detector int Detector number 1
Total int Actuation count 150
prediction float Predicted actuation count 145.0
anomaly bool Anomaly indicator False

Sample:

detector_health = pd.DataFrame({
    'TimeStamp': pd.to_datetime(['2024-01-15 08:00:00', '2024-01-15 08:00:00']),
    'DeviceId': ['signal_1', 'signal_1'],
    'Detector': [1, 2],
    'Total': [150, 5],
    'prediction': [145.0, 150.0],
    'anomaly': [False, True]
})
has_data (Optional)

Records of data availability (presence of any record indicates data exists for that timestamp). Data is expected at 15-minute intervals (96 records per day = full availability).

Column Type Description Example
TimeStamp datetime Event timestamp 2024-01-15 00:00:00
DeviceId str or int Signal identifier (converted to string internally) signal_1 or 12345

Sample:

has_data = pd.DataFrame({
    'TimeStamp': pd.to_datetime(['2024-01-15 00:00:00', '2024-01-15 00:15:00', '2024-01-15 00:30:00']),
    'DeviceId': ['signal_1'] * 3
})
pedestrian (Optional)

Pedestrian button press and service data.

Column Type Description Example
TimeStamp datetime Event timestamp 2024-01-15 12:30:00
DeviceId str or int Signal identifier (converted to string internally) signal_1 or 12345
Phase int Pedestrian phase number 2
PedActuation int Button press count 5
PedServices int Service events (walk signal) 1

Sample:

pedestrian = pd.DataFrame({
    'TimeStamp': pd.to_datetime(['2024-01-15 12:30:00', '2024-01-15 12:30:00']),
    'DeviceId': ['signal_1', 'signal_2'],
    'Phase': [2, 4],
    'PedActuation': [5, 10],
    'PedServices': [1, 2]
})
phase_wait (Optional)

Pre-aggregated phase wait data from the atspm package for phase skip detection.

Column Type Description Example
TimeStamp datetime Bin start time 2024-01-15 14:00:00
DeviceId str or int Signal identifier (converted to string internally) signal_1 or 12345
Phase int Phase number (1-16) 1
AvgPhaseWait float Average wait time in seconds 150.0
MaxPhaseWait float Maximum wait time in seconds 200.0
TotalSkips int Count of skipped phases in this bin 2

Sample:

phase_wait = pd.DataFrame({
    'TimeStamp': pd.to_datetime(['2024-01-15 14:00:00', '2024-01-15 14:15:00', '2024-01-15 14:30:00']),
    'DeviceId': ['signal_1'] * 3,
    'Phase': [1, 1, 2],
    'AvgPhaseWait': [150.0, 160.0, 50.0],
    'MaxPhaseWait': [200.0, 210.0, 70.0],
    'TotalSkips': [2, 3, 0]
})
coordination_agg (Optional)

Pre-aggregated coordination data for cycle length visualization (15-minute bins).

Column Type Description Example
TimeStamp datetime Bin start time (15-minute intervals) 2024-01-15 14:00:00
DeviceId str or int Signal identifier (converted to string internally) signal_1 or 12345
ActualCycleLength float Actual cycle length in seconds 120.0

Sample:

coordination_agg = pd.DataFrame({
    'TimeStamp': pd.to_datetime(['2024-01-15 14:00:00', '2024-01-15 14:15:00', '2024-01-15 14:30:00']),
    'DeviceId': ['signal_1', 'signal_1', 'signal_1'],
    'ActualCycleLength': [100.0, 120.0, 120.0]
})
past_alerts (Optional)

Dictionary of past alerts by type for suppression logic.

Structure:

past_alerts = {
    'maxout': pd.DataFrame,        # Past max-out alerts
    'actuations': pd.DataFrame,     # Past actuation alerts
    'missing_data': pd.DataFrame,   # Past missing data alerts
    'pedestrian': pd.DataFrame,     # Past pedestrian alerts
    'phase_skips': pd.DataFrame,    # Past phase skip alerts
    'system_outages': pd.DataFrame  # Past system outage alerts
}

Sample:

past_alerts = {
    'maxout': pd.DataFrame({
        'DeviceId': ['signal_1', 'signal_2'],
        'Phase': [2, 4],
        'Date': pd.to_datetime(['2024-01-14', '2024-01-14'])
    }),
    'actuations': pd.DataFrame(),  # Empty if no past actuation alerts
    # ... other types
}

Statistical Analysis

The package uses CUSUM (Cumulative Sum) with z-score thresholds to detect anomalies for max-out, actuations, and missing data alerts. Pedestrian and phase skip alerts use different methodologies.

CUSUM Detection Method (Max-Out, Actuations, Missing Data)

The CUSUM algorithm detects sustained deviations from historical baselines:

  1. Baseline Calculation: For each signal component (DeviceId + Phase/Detector), calculate:

    • Historical mean ($\bar{x}$) over all available data
    • Historical standard deviation ($\sigma$) over all available data
  2. Time-Weighted CUSUM: Over a 7-day rolling window, accumulate deviations with a "forgetfulness" weighting that emphasizes recent days:

    • Date Weight: $(days_since_start + 1)^{forgetfulness}$ where $forgetfulness = 2$
    • Daily Deviation: $\max(0, value - \bar{x} - k \cdot \sigma)$ where $k = 1$ (allowance factor)
    • CUSUM Score: $\frac{\sum(DailyDeviation \times DateWeight)}{\sum(DateWeight)} \times 7$
  3. Z-Score: Standard z-score calculated as $(value - \bar{x}) / \sigma$

  4. Alert Trigger: An alert fires when all of the following conditions are met simultaneously:

    • CUSUM exceeds threshold
    • Z-score exceeds threshold
    • Current value exceeds minimum threshold
    • (For max-out only) Services count exceeds minimum

Alert Thresholds

Alert Type CUSUM Threshold Z-Score Threshold Min Value Threshold Extra Condition
Max-Out > 0.25 > 4 > 20% Services > 30
Actuations > 0.20 > 3.5 > 10%
Missing Data > 0.10 > 3 > 5%

Pedestrian Alert Detection

Pedestrian alerts use a modified GEH statistic combined with regional z-score normalization:

  1. Calculate Metrics: For each DeviceId/Phase/Date:

    • Ped_Percent: Pedestrian services / Total phase services
    • Ped_APS: Pedestrian actuations / Pedestrian services (actuations per service)
  2. Signed GEH Calculation: Modified GEH that preserves direction of change: $$GEH_{signed} = \frac{2(V - M)^2}{V + M} \times sign(V - M)$$ where $V$ = observed value, $M$ = historical median for the device/phase

  3. Regional Z-Score Normalization: GEH values are normalized within each region/date to produce z-scores

  4. Combined Z-Score: Combines percent and APS z-scores:

    • If Ped_Percent_ZScore < 0: $|Ped_Percent_ZScore \times Ped_APS_ZScore|$
    • Otherwise: $Ped_Percent_ZScore \times Ped_APS_ZScore$
  5. Alert Trigger: Combined z-score ≤ -11 (indicates significant decrease in pedestrian activity relative to historical patterns)

Phase Skip Alert Detection

Phase skip detection is based on pre-aggregated data from the atspm package:

  1. Data Source: Uses the TotalSkips column from phase_wait data, which counts phases where wait time exceeded expected cycle time (excluding preemption events)

  2. Aggregation: Daily totals of skips are summed by DeviceId/Phase

  3. Alert Trigger: Total aggregated skips exceed phase_skip_alert_threshold (default: 1)

System Outage Detection

  1. Missing Data Threshold: When average missing data across a region exceeds 30% for a given date
  2. Output: Alerts grouped by Date and Region

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Contributions welcome, open an issue for problems or comment for help.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atspm_report-0.2.2.tar.gz (93.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atspm_report-0.2.2-py3-none-any.whl (84.0 kB view details)

Uploaded Python 3

File details

Details for the file atspm_report-0.2.2.tar.gz.

File metadata

  • Download URL: atspm_report-0.2.2.tar.gz
  • Upload date:
  • Size: 93.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for atspm_report-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4c64b3f122e5e7fb8ed89ee9ef586d05faadff2979776d7975aa638081c20451
MD5 e64b204c5142910dd7c97d36de9ca910
BLAKE2b-256 e1bfa439b35c0609e230c0fbcc1a1994290459b2ecbcff2d9b31dd5772107c11

See more details on using hashes here.

File details

Details for the file atspm_report-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: atspm_report-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 84.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for atspm_report-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a174d5824106007040c41384d65faee5a6aba6c5a90b19d9b3749a160ff0c421
MD5 4bdc0bab765543e2040f5f8ed981e4b2
BLAKE2b-256 b4a1c0f75582c148d949c21eb7fb4ef440740267e7e04371b4bcbe7fa8fa712b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page