Skip to main content

Classify timeseries rows as Good / Sus / Bad and render a multi-tag quality timeline chart.

Project description

timeseries-qc

PyPI License: MIT

The open source data quality-control layer for SCADA, DCS, IoT, and historian timeseries data.

Add good / sus / bad quality labels to every row of a pandas DataFrame in five lines. Then render a multi-tag horizontal status timeline, the chart that no other open-source library produces.

A simple to digest and understand timeseries data quality check. Catch the issues in your process data before it affects your downstream analytics and business decisions. Build data quality checks based on business rules and monitor through interactive graph components.

Sample Input - Solar farm SCADA data:

| timestamp                 | tag_name       | value   |
| :------------------------ | :------------- | :------ |
| 2026-01-01 00:00:00+00:00 | INVERTER.MW    | 42.1    |
| 2026-01-01 01:00:00+00:00 | INVERTER.MW    | NULL    |  <-- timeseries_qc will catch this (Null value)
| 2026-01-01 02:00:00+00:00 | INVERTER.MW    | 52.3    |
| 2026-01-01 00:00:00+00:00 | MET.IRRADIANCE | 600.001 |
| 2026-01-01 01:00:00+00:00 | MET.IRRADIANCE | 600.001 |  <-- timeseries_qc will catch this (Stale/Frozen value)
| 2026-01-01 02:00:00+00:00 | MET.IRRADIANCE | 810.818 |
| 2026-01-01 00:00:00+00:00 | TRACKER.ANGLE  | 30.22   |
| 2026-01-01 01:00:00+00:00 | TRACKER.ANGLE  | 45.31   |
| 2026-01-01 02:00:00+00:00 | TRACKER.ANGLE  | 60.22   |

Sample Output - Solar farm SCADA data:

Solar farm SCADA data quality example

Sample Input - Oil field SCADA data:

| timestamp                 | tag_name     | value  |
| :------------------------ | :----------- | :----- |
| 2026-01-01 00:00:00+00:00 | WHP.PSIG     | 0      |  <-- timeseries_qc will catch this (Flatline/Zero)
| 2026-01-01 01:00:00+00:00 | WHP.PSIG     | 0      |  <-- timeseries_qc will catch this (Flatline/Zero)
| 2026-01-01 02:00:00+00:00 | WHP.PSIG     | 0      |  <-- timeseries_qc will catch this (Flatline/Zero)
| 2026-01-01 00:00:00+00:00 | FMRATE.MSCFD | 12.1   |
| 2026-01-01 01:00:00+00:00 | FMRATE.MSCFD | 90.99  |  <-- timeseries_qc will catch this (Rate-of-change spike)
| 2026-01-01 02:00:00+00:00 | FMRATE.MSCFD | 12.3   |
| 2026-01-01 00:00:00+00:00 | OHT.TEMP_F   | 30.2   |
| 2026-01-01 01:00:00+00:00 | OHT.TEMP_F   | 45.2   |
| 2026-01-01 02:00:00+00:00 | OHT.TEMP_F   | 6000.2 |  <-- timeseries_qc will catch this (Out of bounds)

Sample Output - Oil field SCADA data:

Oil  field SCADA data quality example

Features

  • Four built-in rules cover ≥80% of real-world bad data: NullRule, FlatlineRule, DeltaRule, RangeRule
  • Timeline chart (result.plot()) — Plotly Gantt-style, one row per tag, Green/Yellow/Red, hover tooltips
  • YAML config — non-coders set thresholds in a text file, no Python required
  • Timestamp health (result.check_timestamps()) — detects gaps, duplicates, non-monotonic, freq drift, DST ambiguity
  • Self-contained HTML export (result.export_report("report.html")) — offline, no CDN, includes per-issue summary table
  • Per-issue breakdown (result.issue_summary()) — start/end times, row count, duration, and status for each contiguous bad/sus segment
  • Pandas-native — works with any DataFrame that has timestamp, tag_name, value columns

Installation

pip install timeseries-qc

Quickstart (5 lines)

import tsqc
import pandas as pd

df = pd.read_csv("sensor_data.csv")          # columns: timestamp, tag_name, value
result = tsqc.check(df, assume_tz="UTC")     # assume_tz required for tz-naive CSVs
result.plot().show()                          # renders the multi-tag quality timeline

If your CSV already contains tz-aware timestamps (ISO 8601 with +00:00), omit assume_tz.


YAML Config Example

# tsqc_rules.yaml
default_rules:
  - check: null
    level: bad
  - check: flatline
    window: 1h
    min_delta: 0.001
    level: sus
  - check: delta
    threshold: 50.0
    level: sus

tag_rules:
  FOREBAY.LEVEL:
    - check: range
      min: 900
      max: 1100
      level: bad
  "GENERATOR.*":
    - check: range
      min: 0
      max: 200
      level: bad
    - check: flatline
      window: 30min
      min_delta: 0.5   # 0 MW for <30min is valid; longer flatline at non-zero is suspect
      level: sus
result = tsqc.check(df, rules="tsqc_rules.yaml")
result.summary()           # DataFrame: pct_good/sus/bad per tag
result.issue_summary()     # DataFrame: per-issue runs (start, end, rows, duration)
result.check_timestamps()  # DataFrame: gap/duplicate/non_monotonic issues
result.export_report("report.html")  # Full HTML with chart + all tables

Output Schema

result.df adds two columns to your DataFrame:

Column Values Notes
quality "good", "sus", "bad" Worst-level rule wins
quality_reasons e.g. "flatline|range" Pipe-delimited triggered rule names

Comparison with Alternatives

Pecos (Sandia Labs) offers binary pass/fail and has been in maintenance mode since 2021 — no timeline chart and no YAML config. SaQC (Helmholtz UFZ) is a rich flagging engine for environmental science but has an environmental-domain API, no timeline visualization, and an LGPL license. Great Expectations is not timeseries-native and produces no visualization. timeseries-qc is the only library that combines (1) Good/Sus/Bad classification, (2) the multi-tag horizontal status timeline, and (3) YAML-driven configuration in a single pip install.


Examples


Known Limitations (v0.1.0)

  1. Pandas only. PySpark and Polars support are deferred.
  2. No YAML override of default rules. Tag-specific rules add to, not replace, default rules.
  3. Visualization requires Plotly ≥ 5.0. Matplotlib output not supported.
  4. DeltaRule is point-to-point diff only. Rolling-window delta is a v0.2 feature.

License

MIT © timeseries-qc contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timeseries_qc-0.1.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

timeseries_qc-0.1.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file timeseries_qc-0.1.0.tar.gz.

File metadata

  • Download URL: timeseries_qc-0.1.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for timeseries_qc-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e5f662f1f8c9a5260ee92435140dfdbab316a3312c7e2d0bdf9c5fc27d7b29a2
MD5 21cdedae24e84ec3116ed64b17574749
BLAKE2b-256 90b6ec78992f42f19f31f5a2416d8f4ea4cf3cfdb2be6616426a34f19c9c0414

See more details on using hashes here.

File details

Details for the file timeseries_qc-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: timeseries_qc-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for timeseries_qc-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b55cbae86c7e9744cddeeb23eaa9eba74a23886fddbfcb56de0c7ac1c60773d6
MD5 6383594541ccba120e5d30e8ffb9d37f
BLAKE2b-256 7b784ee6d91ab3f0471490f080ddb39449af2ec2cb9fd143afc869050c3db95a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page