Use machine learning clustering methods to perform quality control over NetAtmo data
Project description
NetAtmoQC
Table of Contents
[[TOC]]
About
netatmoqc
is a python package that uses Machine Learning Clustering methods to
quality-control observations collected from NetAtmo
weather stations. It has so far been developed at SMHI
as part of the iObs project.
Please note that this package is still in its development/implementation stage. As such, it may (certainly does) contain (hopefully minor) bugs and lack on documentation. If you wish to collaborate, suggest features or report issues, please contact Paulo Medeiros (SMHI).
A note about this file: We have used the Markdown format throughout. You should, however, be able to read it reasonably well with your plain text processor of choice. Please disregard the formatting marks in this case.
System Requirements
-
python >=3.6.10
-
A C compiler
-
Optional: Ability to compile and run MPI applications.
The system needs to have a working installation of an MPI library. Having Open MPI should be fine, but there are other options.
This requirement is usually already fulfilled in HPC facilities, although, in some cases, you might need to load a module (e.g.,
module load openmpi
). Please check with your HPC support if you have doubts about this.NB.: If this requirement is not fulfilled, you won't be able to run
netatmoqc
using MPI, even if you manage to follow the MPI-related installation instructions presented later on in this file.If you don't have a working MPI library installed in your system but use, for instance, conda to manage your environments/source packages, then running the following commands should get it working:
conda install -c conda-forge openmpi conda install gxx_linux-64
-
Only for Developer-Mode Installtion:
-
poetry
, which can be installed by runningcurl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3
-
Installation
Before proceeding, please make sure that your system fulfils the appropriate system requirements. If you plan to just use the code without modifying it, please follow one of the installation methods presented in the Regular Installation section. However, if you need/wish to modify the code in any way, then please proceed as indicated in the Developer-Mode Installtion section.
N.B.: In any case, you will be presented with instructions for installation
with and without MPI support. You only need to follow one such set of
instructions. Mind, however, the extra system requirements
that apply if you choose to install netatmoqc
with MPI support.
Regular Installation
Regular Installation from PyPi
:point_right: Easiest method if you just want to use the code and don't want to look at the source code at all.
-
Install without MPI support:
pip install netatmoqc
-
Install with MPI support:
pip install netatmoqc[mpi]
Regular Installation Directly From The Git Repo
:point_right: Similar to a regular installation from PyPi, but retrieves the code from the git repo instead (which is usually updated more often).
-
Install without MPI support:
pip install "git+https://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc"
-
Install with MPI support:
pip install "netatmoqc[mpi] @ git+https://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc"
Regular Installation From Downloaded Source
:point_right: For those who have netatmoqc
's source code in a local directory,
wish to install it from there, but also don't want to modify any code.
-
Install without MPI support:
pip install .
-
Install with MPI support:
pip install ".[mpi]"
Developer Mode Installation
:point_right: For those who need/wish to make changes to netatmoqc
's
source code.
-
Install without MPI support:
poetry install
-
Install with MPI support:
poetry install --extras "mpi"
Installing in "developer mode" means that changes made in any of the package's source files become visible as soon as the package is reloaded.
:wrench: Recommendation to contributors: Before making your first commit to the repo, please also run the following:
pre-commit install
This sets up the git hook scripts defined in the
.pre-commit-config.yaml file and only needs to be
done once within the repo's directory. The pre-commit
package is installed when you run any of the poetry install
commands listed
above.
After Installtion: Configuration File
After successful installation, a netatmoqc
command will become available in
your environment. However, before you can use netatmoqc
(apart from the -h
option), you will need a configuration file written in the
TOML format.
Until proper documentation for this file is not ready, please take a look at the docs/config_template.toml file for further information on how the configuration file should look like.
netatmoqc
assumes that one of the following (whichever is first encountered)
is your configuration file :
- A full file path specified via the
NETATMOQC_CONFIG_PATH
envvar - A
config.toml
file located in the directory wherenetatmoqc
is called $HOME/.netatmoqc/config.toml
Usage
After completing the setup, you should be able to run
netatmoqc [opts] SUBCOMMAND [subcommand_opts]
where [opts]
and [subcommand_opts]
denote optional command line arguments
that apply, respectively, to netatmoqc
in general and to SUBCOMMAND
specifically.
Please run netatmoqc -h
for information about the supported subcommands
and general netatmoqc
options. For info about specific subcommands and the
options that apply to them only, please run netatmoqc SUBCOMMAND -h
(note
that the -h
goes after the subcommand in this case).
N.B.: A typical netatmoqc
run with the (preferred) clustering method
HDBSCAN seems to need ca.
20 GB of RAM and takes a couple of minutes to finish. Other implemented
clustering strategies have more modest RAM requirements, but:
- DBSCAN results are not as good as HDBSCAN's in our context
- OPTICS produces similar results as HDBSCAN but runs much slower
Parallelism (single-host or MPI)
The select
subcommand supports parallelism over DTGs. How to activate it
depends on whether you wish to run netatmoqc
on a single host or if you wish
to distribute computations over different computers (e.g. on an HPC cluster).
-
If you are running
netatmoqc
in a single host, then you can export the environment variableNETATMOQC_MAX_PYTHON_PROCS
to any value larger than 0 and run the code as usual. Don't forget, however, to take into account the memory requirements discussed in the usage section! -
If you wish to run
netatmoqc
with MPI, then you must have installed it with MPI support. Assuming this is the case, you can then run the code asmpiexec -n 1 [-usize N] netatmoqc --mpi [opts] select [subcommand_opts]
Notice that:
- Arguments between square brackets are optional
- The
--mpi
switch must come before any subcommand - The value "1" in
-n 1
is mandatory. The code will always start with one "manager" task which will dynamically spawn new worker tasks as needed (up to a maximum number). - If
-usize N
is passed, thenN
should be an integer greater than zero.N
defines the maximum number of extra workers that the manager task is allowed to spawn if necessary. - If
-usize N
is not passed, then:- If the run is part of a submitted job managed by SLURM or PBS, then
N
will be automatically determined from the options passed to the scheduler (e.g.--nnodes
,--ntasks
,--mem-per-cpu
, etc for SLURM). - If the run is running interactive:
N
will take the value of the environment variableNETATMOQC_MAX_PYTHON_PROCS
if set, or, otherwise, will be set to 1.
- If the run is part of a submitted job managed by SLURM or PBS, then
- No more than
length(DTGs)
new worker tasks will be spawn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file netatmoqc-0.1.2.tar.gz
.
File metadata
- Download URL: netatmoqc-0.1.2.tar.gz
- Upload date:
- Size: 175.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.10 CPython/3.6.10 Linux/4.18.0-193.13.2.el8_2.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7270d21e326fea3620282ec7c98d01d70cd478f26460af11b9dd00d8b1a5af2 |
|
MD5 | 76762da197e869190c349ab8712c9494 |
|
BLAKE2b-256 | 5349e680150930acd198b17fdec198582b25980fc49348ce1726f95f0a1d3ec3 |
File details
Details for the file netatmoqc-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: netatmoqc-0.1.2-py3-none-any.whl
- Upload date:
- Size: 178.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.10 CPython/3.6.10 Linux/4.18.0-193.13.2.el8_2.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 226717baa532239e2f49a46343ba35769fa344f8e204dca8cf532677fe119b20 |
|
MD5 | e7de95ea4f1237d5466544102820f748 |
|
BLAKE2b-256 | f97921ee477228dd0012289a1de061e361d6b7249e6b01369ec17586b086bf91 |