Library to generate quicklooks and data quality checks on Helikite campaigns
Project description
helikite-data-processing
This library supports Helikite campaigns by unifying field-collected data, generating quicklooks, and performing quality control on instrument recordings. It is now available on PyPI, can be used via a command‐line interface (CLI), and also runs in Docker containers if needed.
Table of Contents
- Getting Started
- Using the Library
- Cleaner
- Documentation & Examples
- Command-line Usage
- Development
- Configuration
Getting Started
Pip Installation
Helikite is published on PyPI. To install it via pip, run:
pip install helikite-data-processing
After installation, the CLI is available as a system command:
helikite --help
Docker
Note: Docker usage is now optional. For most users, installing via pip is the recommended approach.
Building and Running with Docker
-
Build the Docker image:
docker build -t helikite .
-
Generate project folders and create the configuration file:
docker run \ -v ./inputs:/app/inputs \ -v ./outputs:/app/outputs \ helikite:latest generate_config
-
Preprocess the configuration file:
docker run \ -v ./inputs:/app/inputs \ -v ./outputs:/app/outputs \ helikite:latest preprocess
-
Process data and generate plots:
docker run \ -v ./inputs:/app/inputs \ -v ./outputs:/app/outputs \ helikite:latest
You can also use the pre-built image from GitHub Packages:
docker run \
-v ./inputs:/app/inputs \
-v ./outputs:/app/outputs \
ghcr.io/eerl-epfl/helikite-data-processing:latest generate_config
Makefile
The Makefile provides simple commands for common tasks:
make build # Build the Docker image
make generate_config # Generate the configuration file in the inputs folder
make preprocess # Preprocess data and update the configuration file
make process # Process data and generate plots (output goes into a timestamped folder)
Using the Library
Helikite can be used both as a standalone CLI tool and as an importable Python package. For non-programmers, the CLI is the simplest way to use the library. For programmers, the library can be imported and used in your own scripts:
import helikite
from helikite.processing import preprocess, sorting
from helikite.constants import constants
# For example, to generate a configuration file programmatically:
preprocess.generate_config()
A complete list of available functions and modules is documented on the auto-published documentation site.
Cleaner
The cleaner module is designed to tidy up output folders generated by the application. For instructions on how to use it, refer to the Level 0 notebook.
Documentation & Examples
For full API documentation, usage examples, and tutorials, please visit the Helikite Data Processing Documentation.
The notebooks folder also contains a Level 0 processing example that demonstrates how to use the library for basic data processing tasks.
Command-line Usage
Once installed (via pip or Docker), you can use the CLI to run the three main stages of the application:
-
Generate a configuration file: This creates a config file in your
inputsfolder.helikite generate-config -
Preprocess: Scans the input folder, associates raw instrument files to configurations, and updates the config file.
helikite preprocess -
Process: Processes the input data based on the configuration, normalizes timestamps, and generates plots. (Running without any command runs this stage.)
helikite
For detailed help on any command, append --help (e.g., helikite preprocess --help).
Development
The Instrument class
The structure of the Instrument class allows specific data cleaning activities to be overridden for each instrument that inherits from it. The main application (in helikite.py) calls these class methods to process the data.
Adding more instruments
The configuration file is generated during the generate_config/preprocess steps by iterating over the instantiated classes imported in helikite/instruments/__init__.py. To add a new instrument, create a subclass of Instrument and import it in __init__.py.
Firstly, the class should inherit from Instrument and set a unique name (e.g., for the MCPC instrument):
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
self.name = 'mcpc'
The minimum functions required are:
-
file_identifier(): Accepts the first 50 lines of a CSV file and returnsTrueif it matches the instrument’s criteria (typically checking header content).# Example for the pico instrument: def file_identifier(self, first_lines_of_csv) -> bool: if ("win0Fit0,win0Fit1,win0Fit2,win0Fit3,win0Fit4,win0Fit5,win0Fit6," "win0Fit7,win0Fit8,win0Fit9,win1Fit0,win1Fit1,win1Fit2") in first_lines_of_csv[0]: return True return False
-
set_time_as_index(): Converts the instrument's timestamp information into a common pandasDateTimeIndex.# Example for the filter instrument: def set_time_as_index(self, df: pd.DataFrame) -> pd.DataFrame: df['DateTime'] = pd.to_datetime( df['#YY/MM/DD'].str.strip() + ' ' + df['HR:MN:SC'].str.strip(), format='%y/%m/%d %H:%M:%S' ) df.drop(columns=["#YY/MM/DD", "HR:MN:SC"], inplace=True) df.set_index('DateTime', inplace=True) return df
For more details and examples, refer to the auto-published documentation.
Configuration
There are three sources of configuration parameters:
Application constants
These are defined in helikite/constants.py and include settings such as filenames, folder paths for inputs/outputs, logging formats, and default plotting parameters.
Runtime configuration
The runtime configuration is stored in config.yaml (located in your inputs folder). This file is generated during the generate_config or preprocess steps. It holds runtime arguments for each instrument (e.g., file locations, time adjustments, and plotting settings).
Below is an example snippet from a generated config.yaml:
global:
time_trim:
start: 2022-09-29 10:21:58
end: 2022-09-29 12:34:36
ground_station:
altitude: null
pressure: null
temperature: 7.8
instruments:
filter:
config: filter
date: null
file: /app/inputs/220209A3.TXT
pressure_offset: null
time_offset:
hour: 5555
minute: 0
second: 0
plots:
altitude_ground_level: false
grid:
resample_seconds: 60
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file helikite_data_processing-1.1.3.tar.gz.
File metadata
- Download URL: helikite_data_processing-1.1.3.tar.gz
- Upload date:
- Size: 7.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.11 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38b9fb1a601d04f34a1e91554e649b02eb9a299bdc460596d8ba5499af8e946a
|
|
| MD5 |
687da2d9dfbfd5c43521c67c58c5e05e
|
|
| BLAKE2b-256 |
a811a148522d0729c15e5d52f8ef9f40653aafd3b29895ac544e989e38b8dbdd
|
File details
Details for the file helikite_data_processing-1.1.3-py3-none-any.whl.
File metadata
- Download URL: helikite_data_processing-1.1.3-py3-none-any.whl
- Upload date:
- Size: 8.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.11 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b43338518333f7162f25c3d71ceb3fd4e9b9e31b71805cca8f319224e33da42
|
|
| MD5 |
5fdf8b429056190b9bbbdb4e1517ccaa
|
|
| BLAKE2b-256 |
0516a0fe19d33c165d844b315251c3c610b7f0a0862efac6e0dcb28634a0dbc9
|