Skip to main content

A timeseries data quality control and processing tool/framework

Project description



Project Status: Active – The project has reached a stable, usable state and is being actively developed.

SaQC: System for automated Quality Control

SaQC is a flexible framework for quality control of time series data. It offers a growing collection of algorithms and methods to analyze, annotate, and process time series, while supporting end-to-end metadata enrichment.

SaQC provides multiple user interfaces to fit different workflows:

  • A Python API for programmatic integration
  • A command-line interface with a text-based configuration system
  • A Galaxy Tool for integration into data analysis pipelines
  • A web-based user interface for interactive exploration

Designed with the needs of data professionals in mind — ranging from sensor engineers to domain experts and data scientists — SaQC helps improve and standardize the quality of data products across disciplines.

For an up-to-date overview of features, usage patterns, system components, and customization options, please refer to the online documentation

Installation

SaQC is available on the Python Package Index (PyPI) and can be installed using pip:

python -m pip install saqc

Additionally SaQC is available via conda and can be installed with:

conda create -c conda-forge -n saqc saqc

For more details, see the installation guide.

Usage

SaQC can be used as a command-line application driven by a text-based configuration, or as a Python module with a consistent and simple API.

SaQC as a command-line application

The command-line application is controlled through a semicolon-separated configuration file. This file lists the dataset variables along with the routines to inspect, quality control, and process them.

An example configuration file can be found here

varname    ; test
#----------; ---------------------------------------------------------------------
SM2        ; align(freq="15Min")
'SM(1|2)+' ; flagMissing()
SM1        ; flagRange(min=10, max=60)
SM2        ; flagRange(min=10, max=40)
SM2        ; flagZScore(window="30d", thresh=3.5, method='modified', center=False)
Dummy      ; flagGeneric(field=["SM1", "SM2"], func=(isflagged(x) | isflagged(y)))

As soon as the basic inputs, dataset and configuration file, are prepared, run SaQC:

saqc \
    --config PATH_TO_CONFIGURATION \
    --data PATH_TO_DATA \
    --outfile PATH_TO_OUTPUT

A full SaQC run against provided example data can be invoked with:

saqc \
    --config https://git.ufz.de/rdm-software/saqc/raw/main/docs/resources/data/config.csv \
    --data https://git.ufz.de/rdm-software/saqc/raw/main/docs/resources/data/data.csv \
    --outfile saqc_test.csv

SaQC as a python module

When used as a Python module, SaQC provides a consistent and simple API for programmatic access. This makes it straightforward to define, execute, and customize quality control routines directly within Python, and to integrate SaQC into data analysis workflows and scripts. The following snippet implements the same configuration given above through the Python-API:

import pandas as pd
from saqc import SaQC

data = pd.read_csv(
    "https://git.ufz.de/rdm-software/saqc/raw/main/docs/resources/data/data.csv",
    index_col=0, parse_dates=True,
)

qc = SaQC(data=data)
qc = (qc
      .align("SM2", freq="15Min")
      .flagMissing("SM(1|2)+", regex=True)
      .flagRange("SM1", min=10, max=60)
      .flagRange("SM2", min=10, max=40)
      .flagZScore("SM2", window="30d", thresh=3.5, method='modified', center=False)
      .flagGeneric(field=["SM1", "SM2"], target="Dummy", func=lambda x, y: (isflagged(x) | isflagged(y))))

A more detailed description of the Python API is available in the respective section of the documentation.

Get involved

Contributing

You found a bug or you want to suggest new features? Please refer to our contributing guidelines to see how you can contribute to SaQC.

User support

If you need help or have questions, send us an email to saqc-support@ufz.de

Copyright and License

Copyright(c) 2021, Helmholtz-Zentrum für Umweltforschung GmbH -- UFZ. All rights reserved.

For full details, see LICENSE.

Publications

Lennart Schmidt, David Schäfer, Juliane Geller, Peter Lünenschloss, Bert Palm, Karsten Rinke, Corinna Rebmann, Michael Rode, Jan Bumberger, System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science, Environmental Modelling & Software, 2023, 105809, ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2023.105809. (https://www.sciencedirect.com/science/article/pii/S1364815223001950)

How to cite SaQC

If SaQC is advancing your research, please cite as:

Schäfer, David, Palm, Bert, Lünenschloß, Peter, Schmidt, Lennart, & Bumberger, Jan. (2023). System for automated Quality Control - SaQC (2.3.0). Zenodo. https://doi.org/10.5281/zenodo.5888547

or

Lennart Schmidt, David Schäfer, Juliane Geller, Peter Lünenschloss, Bert Palm, Karsten Rinke, Corinna Rebmann, Michael Rode, Jan Bumberger, System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science, Environmental Modelling & Software, 2023, 105809, ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2023.105809. (https://www.sciencedirect.com/science/article/pii/S1364815223001950)


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saqc-2.7.0.post3.tar.gz (185.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

saqc-2.7.0.post3-py3-none-any.whl (181.1 kB view details)

Uploaded Python 3

File details

Details for the file saqc-2.7.0.post3.tar.gz.

File metadata

  • Download URL: saqc-2.7.0.post3.tar.gz
  • Upload date:
  • Size: 185.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for saqc-2.7.0.post3.tar.gz
Algorithm Hash digest
SHA256 e37ae529ba40a939032e5f080899df13f5bf75b9acf41c44aace31e10b30fcd6
MD5 3b24d9fe828ba1ff03fbf090b6ff8efc
BLAKE2b-256 4087ed43ded95ba343423ad1fa9e3deedcf73fa00cf14ff342fb5a28ec5b96ec

See more details on using hashes here.

File details

Details for the file saqc-2.7.0.post3-py3-none-any.whl.

File metadata

  • Download URL: saqc-2.7.0.post3-py3-none-any.whl
  • Upload date:
  • Size: 181.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for saqc-2.7.0.post3-py3-none-any.whl
Algorithm Hash digest
SHA256 9dda31736343bd859539df64ebe630601453f7fa309b7ea8ccc8661a8db5fb21
MD5 bc26f6614bd633013a59daa86ce4abcf
BLAKE2b-256 5204c2e0ab85668bffe5a680cb16fba7706f6b19cc703a35284d919a15d7496e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page