Skip to main content

A timeseries data quality control and processing tool/framework

Project description

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

System for automated Quality Control (SaQC)

Anomalies and errors are the rule not the exception when working with time series data. This is especially true, if such data originates from in-situ measurements of environmental properties. Almost all applications, however, implicily rely on data, that complies with some definition of 'correct'. In order to infer reliable data products and tools, there is no alternative to quality control. SaQC provides all the building blocks to comfortably bridge the gap between 'usually faulty' and 'expected to be corrected' in a accessible, consistent, objective and reproducible way.

For a (continously improving) overview of features, typical usage patterns, the specific system components and how to customize SaQC to your specific needs, please refer to our online documentation.

Installation

SaQC is available on the Python Package Index (PyPI) and can be installed using pip:

python -m pip install saqc

For a more detailed installion guide, see the installation guide.

Usage

SaQC is both, a command line application controlled by a text based configuration and a python module with a simple API.

SaQC as a command line application

The command line application is controlled by a semicolon-separated text file listing the variables in the dataset and the routines to inspect, quality control and/or process them. The content of such a configuration could look like this:

varname    ; test
#----------; ---------------------------------------------------------------------
SM2        ; shift(freq="15Min")
'SM(1|2)+' ; flagMissing()
SM1        ; flagRange(min=10, max=60)
SM2        ; flagRange(min=10, max=40)
SM2        ; flagMAD(window="30d", z=3.5)
Dummy      ; flagGeneric(field=["SM1", "SM2"], func=(isflagged(x) | isflagged(y)))

As soon as the basic inputs, dataset and configuration file, are prepared, run SaQC:

saqc \
    --config PATH_TO_CONFIGURATION \
    --data PATH_TO_DATA \
    --outfile PATH_TO_OUTPUT

A full SaQC run against provided example data can be invoked with:

saqc \
    --config https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/config.csv \
    --data https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/data.csv \
    --outfile saqc_test.csv

SaQC as a python module

The following snippet implements the same configuration given above through the Python-API:

import pandas as pd
from saqc import SaQC

data = pd.read_csv(
    "https://git.ufz.de/rdm-software/saqc/raw/develop/docs/resources/data/data.csv",
    index_col=0, parse_dates=True,
)

saqc = SaQC(data=data)
saqc = (saqc
        .shift("SM2", freq="15Min")
        .flagMissing("SM(1|2)+", regex=True)
        .flagRange("SM1", min=10, max=60)
        .flagRange("SM2", min=10, max=40)
        .flagMAD("SM2", window="30d", z=3.5)
        .flagGeneric(field=["SM1", "SM2"], target="Dummy", func=lambda x, y: (isflagged(x) | isflagged(y))))

A more detailed description of the Python API is available in the respective section of the documentation.

Changelog

All notable changes to this project will be documented in CHANGELOG.md.

Get involved

Contributing

You found a bug or you want to suggest some cool features? Please refer to our contributing guidelines to see how you can contribute to SaQC.

User support

If you need help or have a question, you can use the SaQC user support mailing list: saqc-support@ufz.de

Copyright and License

Copyright(c) 2021, Helmholtz-Zentrum für Umweltforschung GmbH -- UFZ. All rights reserved.

For full details, see LICENSE.

Acknowledgements

...

Publications

coming soon...

How to cite SaQC

If SaQC is advancing your research, please cite as:

Schäfer, David; Palm, Bert; Lünenschloß, Peter. (2021). System for automated Quality Control - SaQC. Zenodo. https://doi.org/10.5281/zenodo.5888547

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saqc-2.1.0.tar.gz (176.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

saqc-2.1.0-py3-none-any.whl (222.3 kB view details)

Uploaded Python 3

File details

Details for the file saqc-2.1.0.tar.gz.

File metadata

  • Download URL: saqc-2.1.0.tar.gz
  • Upload date:
  • Size: 176.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.2

File hashes

Hashes for saqc-2.1.0.tar.gz
Algorithm Hash digest
SHA256 64438d8ab6967d15829a38117d663ec7879643d742b055e8a0a074adfc1e7530
MD5 781468aad35383d7a10a6bb9ddad8d3e
BLAKE2b-256 f97615f89b0296ac60de4de70da18811d4aeb24179791cdf9563e06dc929392b

See more details on using hashes here.

File details

Details for the file saqc-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: saqc-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 222.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.2

File hashes

Hashes for saqc-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d826036852da5f06aa7894c69f461f44098b67c43374fd002fc43224e137160
MD5 f889ff72e7a082a4b25572d6260f210a
BLAKE2b-256 feeb8755083dfcc209daacc5c496e71002496c66ef3e223cb155f6f939cc2575

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page