Skip to main content

Use machine learning clustering methods to perform quality control over NetAtmo data

Project description

NetAtmoQC

Table of Contents

[[TOC]]

About

netatmoqc is a python package that uses Machine Learning Clustering methods to quality-control observations collected from NetAtmo weather stations. It has so far been developed at SMHI as part of the iObs project.

Please note that this package is still in its development/implementation stage. As such, it may (certainly does) contain (hopefully minor) bugs and lack on documentation. If you wish to collaborate, suggest features or report issues, please contact Paulo Medeiros (SMHI).

A note about this file: We have used the Markdown format throughout. You should, however, be able to read it reasonably well with your plain text processor of choice. Please disregard the formatting marks in this case.

System Requirements

  • python >=3.6.10

  • A C compiler

  • Optional: Ability to compile and run MPI applications.

    The system needs to have a working installation of an MPI library. Having Open MPI should be fine, but there are other options.

    This requirement is usually already fulfilled in HPC facilities, although, in some cases, you might need to load a module (e.g., module load openmpi). Please check with your HPC support if you have doubts about this.

    NB.: If this requirement is not fulfilled, you won't be able to run netatmoqc using MPI, even if you manage to follow the MPI-related installation instructions presented later on in this file.

    If you don't have a working MPI library installed in your system but use, for instance, conda to manage your environments/source packages, then running the following commands should get it working:

      conda install -c conda-forge openmpi
      conda install gxx_linux-64
    
  • Only for Developer-Mode Installtion:

    • poetry, which can be installed by running

        curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3
      

Installation

Before proceeding, please make sure that your system fulfils the appropriate system requirements. If you plan to just use the code without modifying it, please follow one of the installation methods presented in the Regular Installation section. However, if you need/wish to modify the code in any way, then please proceed as indicated in the Developer-Mode Installtion section.

N.B.: In any case, you will be presented with instructions for installation with and without MPI support. You only need to follow one such set of instructions. Mind, however, the extra system requirements that apply if you choose to install netatmoqc with MPI support.

Regular Installation

Regular Installation from PyPi

:point_right: Easiest method if you just want to use the code and don't want to look at the source code at all.

  • Install without MPI support:

      pip install netatmoqc
    
  • Install with MPI support:

      pip install netatmoqc[mpi]
    
Regular Installation Directly From The Git Repo

:point_right: Similar to a regular installation from PyPi, but retrieves the code from the git repo instead (which is usually updated more often).

  • Install without MPI support:

      pip install "git+https://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc"
    
  • Install with MPI support:

      pip install "netatmoqc[mpi] @ git+https://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc"
    
Regular Installation From Downloaded Source

:point_right: For those who have netatmoqc's source code in a local directory, wish to install it from there, but also don't want to modify any code.

  • Install without MPI support:

      pip install .
    
  • Install with MPI support:

      pip install ".[mpi]"
    

Developer Mode Installation

:point_right: For those who need/wish to make changes to netatmoqc's source code.

  • Install without MPI support:

      poetry install
    
  • Install with MPI support:

      poetry install --extras "mpi"
    

Installing in "developer mode" means that changes made in any of the package's source files become visible as soon as the package is reloaded.

:wrench: Recommendation to contributors: Before making your first commit to the repo, please also run the following:

pre-commit install

This sets up the git hook scripts defined in the .pre-commit-config.yaml file and only needs to be done once within the repo's directory. The pre-commit package is installed when you run any of the poetry install commands listed above.

After Installtion: Configuration File

After successful installation, a netatmoqc command will become available in your environment. However, before you can use netatmoqc (apart from the -h option), you will need a configuration file written in the TOML format.

Until proper documentation for this file is not ready, please take a look at the docs/config_template.toml file for further information on how the configuration file should look like.

netatmoqc assumes that one of the following (whichever is first encountered) is your configuration file :

  1. A full file path specified via the NETATMOQC_CONFIG_PATH envvar
  2. A config.toml file located in the directory where netatmoqc is called
  3. $HOME/.netatmoqc/config.toml

Usage

After completing the setup, you should be able to run

netatmoqc [opts] SUBCOMMAND [subcommand_opts]

where [opts] and [subcommand_opts] denote optional command line arguments that apply, respectively, to netatmoqc in general and to SUBCOMMAND specifically.

Please run netatmoqc -h for information about the supported subcommands and general netatmoqc options. For info about specific subcommands and the options that apply to them only, please run netatmoqc SUBCOMMAND -h (note that the -h goes after the subcommand in this case).

N.B.: A typical netatmoqc run with the (preferred) clustering method HDBSCAN seems to need ca. 20 GB of RAM and takes a couple of minutes to finish. Other implemented clustering strategies have more modest RAM requirements, but:

  • DBSCAN results are not as good as HDBSCAN's in our context
  • OPTICS produces similar results as HDBSCAN but runs much slower

Parallelism (single-host or MPI)

The select subcommand supports parallelism over DTGs. How to activate it depends on whether you wish to run netatmoqc on a single host or if you wish to distribute computations over different computers (e.g. on an HPC cluster).

  • If you are running netatmoqc in a single host, then you can export the environment variable NETATMOQC_MAX_PYTHON_PROCS to any value larger than 0 and run the code as usual. Don't forget, however, to take into account the memory requirements discussed in the usage section!

  • If you wish to run netatmoqc with MPI, then you must have installed it with MPI support. Assuming this is the case, you can then run the code as

    mpiexec -n 1 [-usize N] netatmoqc --mpi [opts] select [subcommand_opts]
    

    Notice that:

    • Arguments between square brackets are optional
    • The --mpi switch must come before any subcommand
    • The value "1" in -n 1 is mandatory. The code will always start with one "manager" task which will dynamically spawn new worker tasks as needed (up to a maximum number).
    • If -usize N is passed, then N should be an integer greater than zero. N defines the maximum number of extra workers that the manager task is allowed to spawn if necessary.
    • If -usize N is not passed, then:
      • If the run is part of a submitted job managed by SLURM or PBS, then N will be automatically determined from the options passed to the scheduler (e.g. --nnodes, --ntasks, --mem-per-cpu, etc for SLURM).
      • If the run is running interactive: N will take the value of the environment variable NETATMOQC_MAX_PYTHON_PROCS if set, or, otherwise, will be set to 1.
    • No more than length(DTGs) new worker tasks will be spawn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

netatmoqc-0.1.2.tar.gz (175.9 kB view details)

Uploaded Source

Built Distribution

netatmoqc-0.1.2-py3-none-any.whl (178.4 kB view details)

Uploaded Python 3

File details

Details for the file netatmoqc-0.1.2.tar.gz.

File metadata

  • Download URL: netatmoqc-0.1.2.tar.gz
  • Upload date:
  • Size: 175.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.6.10 Linux/4.18.0-193.13.2.el8_2.x86_64

File hashes

Hashes for netatmoqc-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e7270d21e326fea3620282ec7c98d01d70cd478f26460af11b9dd00d8b1a5af2
MD5 76762da197e869190c349ab8712c9494
BLAKE2b-256 5349e680150930acd198b17fdec198582b25980fc49348ce1726f95f0a1d3ec3

See more details on using hashes here.

File details

Details for the file netatmoqc-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: netatmoqc-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 178.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.6.10 Linux/4.18.0-193.13.2.el8_2.x86_64

File hashes

Hashes for netatmoqc-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 226717baa532239e2f49a46343ba35769fa344f8e204dca8cf532677fe119b20
MD5 e7de95ea4f1237d5466544102820f748
BLAKE2b-256 f97921ee477228dd0012289a1de061e361d6b7249e6b01369ec17586b086bf91

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page