This is an example package.

These details have not been verified by PyPI

Project links

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
Programming Language
Topic
- Scientific/Engineering

Project description

Contents
Overview
Installation
Basic usage
The database file
- On data types and database size
Source data file format
- Notes on timestamps
How the flux estimation works
Known limitations

Overview

OpenToolFlux is a software to estimate gas fluxes from soil using data from automatic chambers. The software is built for data from the Picarro brand of equipment but in principle could work with, or be adapted to, data from other similar equipment.

The software analyzes time series of gas concentrations in gas collected from multiple automated chambers connected to a single gas analyzer. The multiple chambers sequentially close (e.g. in 20-minute intervals) and gas is continuously pumped into the analyzer from the currently closed chamber. The profile of the concentration change during each chamber closure is used to estimate the flux from the soil under that chamber. A separate section below describes in further detail how the calculation works and what it assumes.

The software is used through a command-line interface (CLI) which allows the user to:

Import data from one or more source data files into a database file.
Filter the database based on time period, alarm status values, and other criteria.
Identify segments of the database corresponding to closures of chambers.
Estimate gas fluxes from the concentration time series during closure, and export these to a tidy data file.
Generate diagrams for diagnostics.

Installation

OpenToolFlux is written in Python and works with Python 3.8+.

This installation guide assumes that you already have Python 3.8+ and that you know how to use a terminal and install Python packages using pip. If you are unsure about any of these points, you might find it helpful to follow the instructions given by any of these sources:

Instructions from RealPython on installing Python and using pip
Instructions from Python.org on getting started with Python
Instructions from PyPA on using pip and virtual environments
Instructions Python for Data Analysis, 3rd Edition using miniconda

How to install:

Optionally, create and activate a virtual environment.
Install using pip install opentoolflux
Verify that the installation was successful by running the command opentoolflux --help. If the installation has succeeded, this will show a list of the available commands.

If installation according to these instructions fail, please submit a bug report using the issue tracker.

Basic usage

The command-line interface consists of a command opentoolflux with a number of subcommands such as opentoolflux import and opentoolflux fluxes. Here is a quick introduction to basic usage of the command-line interface.

Run an example

You might find it instructive to get started using an example. An example of a configuration file and input data can be downloaded from here: example.

Use the built-in documentation

The command-line interface has built-in documentation which is accessed by calling commands such as:

opentoolflux --help
opentoolflux import --help

When adding --help to a command, the command will do nothing except print an information message.

We recommend exploring the software by reading the built-in help and experimenting with commands on some test data. OpenToolFlux will never change or delete your source data files, so you can safely play around. (And in any case, you do have a backup of your important research data, right?)

Configure OpenToolFlux using a configuration file

Most configuration is made in a configuration file written in TOML language. All the configuration options are listed and explained in the example file found here: example/opentoolflux.toml.

A small number of configuration options can be made on the command line. These are listed and explained in the built-in documentation.

OpenToolFlux by default looks for a configuration file named opentoolflux.toml in the working directory. This default can be overridden using the --config flag as follows:

opentoolflux --config my_config.toml [command]

Import data from source files

When the configuration file is in place, import data from source files using the following command:

opentoolflux import

This will create a new database, or add data to an existing one, located at opentoolflux/database.feather. The data files to read are specified in the import section of the config file. Read more below about the source data file format.

Note: opentoolflux will never change or remove the source data files, so you can safely try commands to see what happens. If you want to start over from zero, simply remove the database.feather file and run opentoolflux import again.

Other ways to get a database

It is also possible to copy the opentoolflux/database.feather file between computers. To "export" the database, simply copy the database.feather file and save it on a USB stick or network drive, or even send it by email if it's not too large.

You can also create the database file in any other way you like, following the technical specification of the database below.

Estimate gas fluxes

When the database.feather file is in place, estimate gas fluxes using the following command:

opentoolflux fluxes

This will do several things:

Optionally, filter the database as specified in the filters section of the config file.
Split the remaining data into segments by chamber as specified in the measurements section of the config file.
Ignore any segments that are too short or too long, following the measurements section of the config file. Unexpectedly short segments can be created, for example, if the equipment is shut down or restarted.
For each of the remaining measurements, estimate the flux as specified in the fluxes section of the config file. (Read more about how the flux estimation works below.)
Save all the results to a file opentoolflux/fluxes.csv.

Plot results

The command group opentoolflux plot can be used to generate the following figures:

`flux-fits`: flux estimation diagnostics

To visualize the gas flux estimation and identify potential problems, run the following command:

opentoolflux plot flux-fits

This command estimates gas fluxes following the same steps as the opentoolflux fluxes command, but instead of a results table it outputs one figure for each measurement in the folder opentoolflux/plots/flux-fits. Each figure shows the gas concentration(s) over time during a chamber closure, and the curve that has been fit to estimate the gas flux(es).

The database file

OpenToolFlux uses a database file which is just a table stored as a Feather file. The default file path to the database is opentoolflux_db.feather stored in the same directory as the configuration file.

The database has one row per sample and normally contains the following columns:

__TIMESTAMP__: a timestamp of the sample, in UTC. This column is used as primary key in the database, so the timestamps must be unique. The table must be sorted by timestamp in ascending order. The __TIMESTAMP__ column is the only mandatory column in the database (although a database with only timestamps is not really useful).
One column identifying the current chamber. With the Picarro data, this is a number, but any data type will work.
One column giving sample concentration of each gas to analyze, e.g., CO2, CH4_dry and/or N2O_dry.
Optionally, additional columns used to filter out samples. For example ALARM_STATUS in the case of Picarro data files.

The command opentoolflux import (see above) can be used to create the database from Picarro (or other) data files. It is also possible to create the database in any way (e.g., using a custom Python or R script). The database file can also be copied between folders or computers without any problem.

On data types and database size

The database can contain columns of different data types: unsigned (nonnegative) integers (uint), signed integers (int), floating-point numbers (float), booleans (bool), and strings (str). The numeric datatypes (uint, int, and float) come in different precisions/sizes:

uint8: Integers 0 to 255
uint16: Integers 0 to 65,535
uint32: Integers 0 to 4,294,967,295
uint64: Integers 0 to 18,446,744,073,709,551,615
int8: Integers -128 to 127
int16: Integers -32,768 to 32,767
int32: Integers -2,147,483,648 to 2,147,483,647
int64: Integers -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
float16: 16-bit floating point (a.k.a. "half precision")
float32: 32-bit floating point (a.k.a. "single precision")
float64: 64-bit floating point (a.k.a. "double precision")

Optimizing only for data types capacity and precision, it makes sense to always choose uint64, int64 and float64. However, for large databases (long time series), the database can grow fairly large, and it may make sense to choose a more restrictive data type.

For example, if the chambers are encoded as integers 1-12, or even 1-100, it is more than enough to use an int8 or uint8, which takes only 1/8 of the space compared to an int64.

For the float data types, the choice is not as obvious because there is a loss of precision going from float64 to float32 or float16. But practically speaking, even a float16 in many cases will be sufficient. In technical terms, a float16 significand has 3-4 significant decimal digits.

Therefore, for example, when we work with Picarro data on N2O concentrations in ppmv (roughly 0.3 ppmv N2O), a float16 can encode the difference beteween 0.300 ppmv and 0.301 ppmv without problem (a difference of 1 ppbv). This precision is much better than the second-to-second noise in the Picarro concentration data. This example shows that for many purposes, using float16 instead of float64 for gas concentrations will practically make very little difference for results. (By the way, the Picarro software converts all gas concentrations to float64 before doing the flux estimate calculation, so only the value stored in the database is limited by the float16 encoding.)

If in doubt about float data types, we suggest to use float32 which has at least 6 decimal digits of precision in the significand and thus should be far more precise than practically speaking any gas analyzer out there.

How much space can be saved by choosing smaller data types? As an example, consider a database with the following data columns:

Timestamp (always 64 bits in a Feather file)
Chamber number (uint8 or uint64)
Alarm status (int8 or int64)
Five gas concentrations: N2O, NO2, CH4, CO2, H2O (float16 or float64)

With the smaller data types, each row will take 64 + 2 * 8 + 5 * 16 = 160 bits = 20 bytes.

With the larger data types, each row will take 64 + 2 * 64 + 3 * 64 = 384 bits = 64 bytes, or 3.2 times as much.

If we collect one data row per second during one year, the resulting database sizes will be either 601 or 1,925 megabytes (MiB). If your computer has less than 4 gigabytes of RAM memory, the program might get slow or even crash with the larger database, and in any case you might care about the difference in space on disk.

This example also shows that if your dataset is perhaps only a few weeks long with frequency 1 second, or maybe one year with frequency 1 minute, the database file will anyway be so small that there are probably very few reasons to worry about database size.

A final related note is that floating-point data to be converted to timestamps in the opentoolflux import command should always be float64. The conversion of floating-point Unix timestamps (in seconds) is designed to preserve 6 decimal places (microseconds), something which requires float64. The __TIMESTAMP__ column in the end is always encoded using 64 bits anyway, so there is no space to be saved by parsing the timestamp column as a float32. Failure to specify float64 as data type for the timestamp column raises a helpful error message.

Source data file format

Here is an example of what the default source file format looks like:

EPOCH_TIME      ALARM_STATUS   solenoid_valves     N2O_dry
1620345675.170  0              5.0000000000E+00    3.3926340875E-01
1620345675.991  0              5.0000000000E+00    3.3928078030E-01
1620345676.605  2              5.0000000000E+00    3.5087647532E-01
1620345677.312  0              6.0000000000E+00    3.3491837412E-01

The data files from our Picarro equipment look like this, but with many more columns containing various information. A full example is given here: example/indata.

Source data files must be be delimited text files. The default delimiter, following the data files we get from our Picarro equipment, is one or more whitespace characters (sep = '\s+'), but other delimiters can be specified using the sep setting (e.g., sep = ',' for standard csv files).

(Technical note: The Picarro source files, following roughly the format shown above, can also be seen as fixed-width files, but since the data fields do not contain whitespace, they can also be parsed as delimited files as described here.)

Each source data file must have a one-line header specifying the column names.

Notes on timestamps

One of the columns must contain timestamps of the measurements. The timestamps can be encoded as:

Numeric values, which are interpreted to be Unix timestamps expressed in seconds. In Picarro data files, the EPOCH_TIME column is a Unix timestamp in seconds (with three decimals, giving millisecond resolution).
String values, which are parsed using pandas.to_datetime() and then converted to UTC timestamps. This means that timestamp strings can be expressed
- in UTC using a string such as "2021-12-07T11:00:24.123Z",
- in any other timezone, e.g., "2021-12-07 13:00:24.123+0200", or
- without timezone, e.g., "2021-12-07 13:00:24", which will be interpreted as UTC.

When running the opentoolflux import command, the timestamp source column is converted to UTC timestamps following the above rules, and renamed to __TIMESTAMP__.

How the flux estimation works

This section will explain

Assumptions about chambers, closure, no recirculation
Derivation and solution of differential equation
How to estimate flux using linear regression on transformed data
Complication caused by delay from chamber closure to arrival of gas; parameters t0_delay and t0_margin.
Sensitivity to parameter errors

Known limitations

Only supports setup with no recirculation
All the settings for measurements and fluxes are the same for all chambers

Project details

These details have not been verified by PyPI

Project links

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
Programming Language
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.3.0

Sep 19, 2023

0.2.3

Mar 21, 2023

0.2.2

Mar 21, 2023

0.2.1

Mar 21, 2023

0.2.0

Mar 21, 2023

0.1.0

Mar 20, 2023

This version

0.1.0.dev0 pre-release

Mar 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

opentoolflux-0.1.0.dev0-py3-none-any.whl (32.3 kB view hashes)

Uploaded Mar 20, 2023 Python 3

Hashes for opentoolflux-0.1.0.dev0-py3-none-any.whl

Hashes for opentoolflux-0.1.0.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`81e35c3c78d5aad4c87eb01caa03118d54db0e41c275a4941e08418eda100fe0`
MD5	`441e9aaa4d4d8ff896befadebcb63d06`
BLAKE2b-256	`b0843a457cd81071ff9c716cc821a389bf12fcffe97e59af6c00d40807e11501`

opentoolflux 0.1.0.dev0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Contents

Overview

Installation

Basic usage

Run an example

Use the built-in documentation

Configure OpenToolFlux using a configuration file

Import data from source files

Other ways to get a database

Estimate gas fluxes

Plot results

`flux-fits`: flux estimation diagnostics

The database file

On data types and database size

Source data file format

Notes on timestamps

How the flux estimation works

Known limitations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

opentoolflux 0.1.0.dev0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Contents

Overview

Installation

Basic usage

Run an example

Use the built-in documentation

Configure OpenToolFlux using a configuration file

Import data from source files

Other ways to get a database

Estimate gas fluxes

Plot results

flux-fits: flux estimation diagnostics

The database file

On data types and database size

Source data file format

Notes on timestamps

How the flux estimation works

Known limitations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

`flux-fits`: flux estimation diagnostics