A timeseries data quality control and processing tool/framework
Project description
SaQC: System for automated Quality Control
SaQC is a flexible framework for quality control of time series data. It offers a growing collection of algorithms and methods to analyze, annotate, and process time series, while supporting end-to-end metadata enrichment.
SaQC provides multiple user interfaces to fit different workflows:
- A Python API for programmatic integration
- A command-line interface with a text-based configuration system
- A Galaxy Tool for integration into data analysis pipelines
- A web-based user interface for interactive exploration
Designed with the needs of data professionals in mind — ranging from sensor engineers to domain experts and data scientists — SaQC helps improve and standardize the quality of data products across disciplines.
For an up-to-date overview of features, usage patterns, system components, and customization options, please refer to the online documentation
Installation
SaQC is available on the Python Package Index (PyPI) and
can be installed using pip:
python -m pip install saqc
Additionally SaQC is available via conda and can be installed with:
conda create -c conda-forge -n saqc saqc
For more details, see the installation guide.
Usage
SaQC can be used as a command-line application driven by a text-based configuration, or as a Python module with a consistent and simple API.
SaQC as a command-line application
The command-line application is controlled through a semicolon-separated configuration file. This file lists the dataset variables along with the routines to inspect, quality control, and process them.
An example configuration file can be found here
varname ; test
#----------; ---------------------------------------------------------------------
SM2 ; align(freq="15Min")
'SM(1|2)+' ; flagMissing()
SM1 ; flagRange(min=10, max=60)
SM2 ; flagRange(min=10, max=40)
SM2 ; flagZScore(window="30d", thresh=3.5, method='modified', center=False)
Dummy ; flagGeneric(field=["SM1", "SM2"], func=(isflagged(x) | isflagged(y)))
As soon as the basic inputs, dataset and configuration file, are
prepared, run SaQC:
saqc \
--config PATH_TO_CONFIGURATION \
--data PATH_TO_DATA \
--outfile PATH_TO_OUTPUT
A full SaQC run against provided example data can be invoked with:
saqc \
--config https://git.ufz.de/rdm-software/saqc/raw/main/docs/resources/data/config.csv \
--data https://git.ufz.de/rdm-software/saqc/raw/main/docs/resources/data/data.csv \
--outfile saqc_test.csv
SaQC as a python module
When used as a Python module, SaQC provides a consistent and simple API for programmatic access. This makes it straightforward to define, execute, and customize quality control routines directly within Python, and to integrate SaQC into data analysis workflows and scripts. The following snippet implements the same configuration given above through the Python-API:
import pandas as pd
from saqc import SaQC
data = pd.read_csv(
"https://git.ufz.de/rdm-software/saqc/raw/main/docs/resources/data/data.csv",
index_col=0, parse_dates=True,
)
qc = SaQC(data=data)
qc = (qc
.align("SM2", freq="15Min")
.flagMissing("SM(1|2)+", regex=True)
.flagRange("SM1", min=10, max=60)
.flagRange("SM2", min=10, max=40)
.flagZScore("SM2", window="30d", thresh=3.5, method='modified', center=False)
.flagGeneric(field=["SM1", "SM2"], target="Dummy", func=lambda x, y: (isflagged(x) | isflagged(y))))
A more detailed description of the Python API is available in the respective section of the documentation.
Get involved
Contributing
You found a bug or you want to suggest new features? Please refer to our contributing guidelines to see how you can contribute to SaQC.
User support
If you need help or have questions, send us an email to saqc-support@ufz.de
Copyright and License
Copyright(c) 2021, Helmholtz-Zentrum für Umweltforschung GmbH -- UFZ. All rights reserved.
- Documentation: Creative Commons Attribution 4.0 International
- Source code: GNU General Public License 3
For full details, see LICENSE.
Publications
Lennart Schmidt, David Schäfer, Juliane Geller, Peter Lünenschloss, Bert Palm, Karsten Rinke, Corinna Rebmann, Michael Rode, Jan Bumberger, System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science, Environmental Modelling & Software, 2023, 105809, ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2023.105809. (https://www.sciencedirect.com/science/article/pii/S1364815223001950)
How to cite SaQC
If SaQC is advancing your research, please cite as:
Schäfer, David, Palm, Bert, Lünenschloß, Peter, Schmidt, Lennart, & Bumberger, Jan. (2023). System for automated Quality Control - SaQC (2.3.0). Zenodo. https://doi.org/10.5281/zenodo.5888547
or
Lennart Schmidt, David Schäfer, Juliane Geller, Peter Lünenschloss, Bert Palm, Karsten Rinke, Corinna Rebmann, Michael Rode, Jan Bumberger, System for automated Quality Control (SaQC) to enable traceable and reproducible data streams in environmental science, Environmental Modelling & Software, 2023, 105809, ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2023.105809. (https://www.sciencedirect.com/science/article/pii/S1364815223001950)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file saqc-2.7.0.post3.tar.gz.
File metadata
- Download URL: saqc-2.7.0.post3.tar.gz
- Upload date:
- Size: 185.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e37ae529ba40a939032e5f080899df13f5bf75b9acf41c44aace31e10b30fcd6
|
|
| MD5 |
3b24d9fe828ba1ff03fbf090b6ff8efc
|
|
| BLAKE2b-256 |
4087ed43ded95ba343423ad1fa9e3deedcf73fa00cf14ff342fb5a28ec5b96ec
|
File details
Details for the file saqc-2.7.0.post3-py3-none-any.whl.
File metadata
- Download URL: saqc-2.7.0.post3-py3-none-any.whl
- Upload date:
- Size: 181.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dda31736343bd859539df64ebe630601453f7fa309b7ea8ccc8661a8db5fb21
|
|
| MD5 |
bc26f6614bd633013a59daa86ce4abcf
|
|
| BLAKE2b-256 |
5204c2e0ab85668bffe5a680cb16fba7706f6b19cc703a35284d919a15d7496e
|