This library provides tools to transform solar irradiation data from various networks to uniform NetCDF files. It also provides tools to request and manipulate those NetCDF files

Project description

a# Introduction

This repository holds python code/tools to transform in-situ irradiation data to NetCDF and load / manipulate the result files locally or other the OpenDAP protocol.

Installation

This code is available as a PIP package :

pip install libinsitu

The pip setup provides access to each script in ./bin/ as a ins-<script> command

Structure

bin : This folder contain CLI utils. The main ones are :
- transform.py (ins-transform): Transform raw in situ data files to NetCDF
- cat.py (ins-cat) : Extract / filter data from NetCDF (local or OpenDAP) to CSV
- ls.py (ins-ls): Explore the content of a TDS (Thredds) catalog.
- ...
libinsitu : Main files of the library
- res : Resource files
  - base.cdl : Base CDL
  - station-info : Meta data for each network
    - [network].csv
- cli : Code for CLI entry points
- test : Test suite
- handlers : Data readers for each network

Manual

CLI

Documentation of the main scripts in ./bin. Each script is made available by pip as ins-<script> command

transform.py (ins-transform)

Transforms raw input files into NetCDF output file (or update it), following the CF convention.

Usage

ins-transform. [-h] --network {BSRN,enerMENA,ABOM,SAURAN} --station-id <SID> [--incremental]
                    [--strict-resolution] [--check] [--status-folder <folder>]
                    <out.nc> <file|dir> [<file|dir> ...]

positional arguments:
  <out.nc>              Output file
  <file|dir>            Input files or folders

optional arguments:
  -h, --help            show this help message and exit
  --network {BSRN,enerMENA,ABOM,SAURAN}, -n {BSRN,enerMENA,ABOM,SAURAN}
                        Network name
  --station-id <SID>, -s <SID>
                        Station ID
  --incremental, -i     Incremental mode, skipping input files having a '.done' status file
  --strict-resolution, -sr
                        Skip chunks having a different resolution
  --check, -c           Check potential override of data
  --status-folder <folder>, -f <folder> Separate folder for .done/.err files

Example

> ins-transform -n BSRN -s ENA  -i  ENA.nc data/ena/

The resulting NetCDF file will be created following the CDL schema. The Network and station ID should be described in networks.csv and the corresponding station-info/{network}.csv.

cat.py [ins-cat]

Query / filter in-situ data from local or remote (over OpenDap) NetCDF files.

Usage

ins-cat [-h] [--type {csv,text}] [--skip-na]
               [--filter '<time> or <from_time>~<to-time>, with any sub part of 'YYYY-mm-ddTHH:MM:SS']
               [--cols <col1>,<col2> ..] [--user USER] [--password PASSWORD] [--steps STEPS]
               [--chunk_size CHUNK_SIZE]
               <file.nc> or <url.nc>

positional arguments:
  <file.nc> or <url.nc> Input file or URL

optional arguments:
  -h, --help            show this help message and exit
  --type {csv,text}, -t {csv,text} 
                        Output type
  --skip-na, -s         Skip lines with only NA values
  --filter, -f '<time> or <from_time>~<to-time>, with any sub part of 'YYYY-mm-ddTHH:MM:SS' 
                        Time filter
  --cols, -c <col1>,<col2> ..
                        Selection of columns. All by default
  --user, -u USER  User login (or TDS_USER env var), for URL
  --password, -p PASSWORD
                        User password (or TDS_PASS env var), for URL
  --steps, -st STEPS
                        Downsampling (default = 1 : no downsampling)
  --chunk_size, -cs CHUNK_SIZE
                        Size of chunks (5000 by default)

Example

Extract GHI data from XIA station, for january 2005, over OpenDAP :

> export TDS_USER=<user> 
> export TDS_PASS=<pass>
> ins-cat http://tds.webservice-energy.org/thredds/dodsC/bsrn-stations/BSRN-XIA.nc -c GHI -s --filter 2005-01 -t csv

ls.py [ins-ls]

Lists contents of a remote TDS (Thredds) server.

Usage

ins-ls [-h] [--user USER] [--password PASSWORD] <http://host/catalog.xml>

positional arguments:
  <http://host/catalog.xml> Start URL (catalog.xml)

optional arguments:
  -h, --help            show this help message and exit
  --user USER, -u USER  User login (or TDS_USER env var)
  --password PASSWORD, -p PASSWORD
                        User password (or TDS_PASS env var)

Example

List all in-situ networks

> ins-ls http://tds.webservice-energy.org/thredds/in-situ.xml

Python API

This section documents the main functions of the library.

nc2df(...)

Load a NetCDF in-situ file (or part of it) into a panda Dataframe, with time as index.

module : libinsitu.common

Signature

nc2df(
      ncfile : Union[Dataset, str],
      start_time: Union[datetime, datetime64]=None, end_time:Union[datetime, datetime64]=None,
      drop_duplicates=True,
      skip_na=False,
      vars=None,
      user=None,
      password=None,
      chunked=False,
      chunk_size=CHUNK_SIZE,
      steps=1)

ncfile: NetCDF Dataset or filename, or URL
drop_duplicates: If true (default), duplicate rows with same time are droppped
skip_na : If True, drop rows containing only nan values
start_time: Start time (first one by default) : Datetime or datetime64
end_time: End time (last one by default) : Datetile or datetime64
vars: List of columns names to convert (all by default)
user: Optional login for URL
password: Optional password for URL
chunk_size : Size of chunks for chunked data
steps Downsampling (1 by default)

Example

from libinsitu.common import nc2df

df = nc2df("data/station.nc")

fetch_catalog(...)

Feth and parse XML catalog from a TDS (Thredds) server.

module : libinsitu.catalog

Signature

fetch_catalog(url, session, recursive=True)

url : URL of catalog.xml
session : HTTP session (possibily with user/password)
recursive : Fetch sub catalogs ?

Example

session = Session()
session.auth = ("user", "password")
catalog = fetch_catalog(args.url, session, recursive=False)

Adding a new Network

To support a new Network, one should :

Add one line of meta data for the network in res/{networks}.csv
Add a CSV file of meta data for each station in res/station-info/{network}.csv
Add an implementation in libinsitu/handlers/.py and register it in libinsitu/handlers/__init_.py

Input files pattern

In particular, one should fill the column RawDataPath of networks.csv. This column contains a file pattern used to find the proper input files for a given station.

The pattern supports :

Placeholders for station meta-data (with {Station_<Attribute>})
Date ranges ({YYYY}, {MM}, ...).
Looking within zip files (after the ! separator)
Wildcards : *

Here are some examples of patterns :

pvlive_{YYYY}-{MM}.zip!{YYYY}-{MM}/{Station_UID}_{YYYY}-{MM}.tsv
{station_id}/{station_id}{MM}{YY}*.dat.gz

Main method

The handler should extend the method read_chunk(filename) from the abstract class InSituHandler : It should take a filename as input and return a panda Dataframe with the following (optional) columns :

Name	Type	Unit	Role
Time (index)	Datetime	UTC time	Time
GHI	float	W.m^-2	Global Horizontal Irradiance
DHI	float	W.m^-2	Diffuse radiation
DNI	float	W.m^-2	Direct radiation
T2	float	K	Temperature
RH	float	ratio: 0.0-1.0	Relative humidity
P	float	Pa	Pressure

CDL

Each new NetCDF file is created using the CDL template res/cdl/base.cdl. It contains placeholders that are replaced by the values found in the corresponding station info file in libinsitu/res/station-info/{network}.csv

Project details

Release history Release notifications | RSS feed

1.6.1

Jun 19, 2024

1.6

Feb 8, 2024

1.5

Jun 14, 2023

1.4

Mar 22, 2023

1.3.1

Jan 23, 2023

1.3

Jan 12, 2023

This version

1.2

Oct 5, 2022

1.1.1

May 5, 2022

1.1

May 5, 2022

1.0

Apr 5, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libinsitu-1.2.tar.gz (85.9 kB view hashes)

Uploaded Oct 5, 2022 Source

Built Distribution

libinsitu-1.2-py2.py3-none-any.whl (103.0 kB view hashes)

Uploaded Oct 5, 2022 Python 2 Python 3

Hashes for libinsitu-1.2.tar.gz

Hashes for libinsitu-1.2.tar.gz
Algorithm	Hash digest
SHA256	`9a45d84e28c52c5a2671d3f86f2231fd5fb06090f1fce2b023038ae7b6c05556`
MD5	`3abc9672d14622aa0d108196dc6e6400`
BLAKE2b-256	`f3d8f4f2d83ee6b737a2049a7f018e393ba05a7d6be27a3a76b1190e863c3366`

Hashes for libinsitu-1.2-py2.py3-none-any.whl

Hashes for libinsitu-1.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`a89416515e80ef310ac347de074848dde0b878f923c0cdf8fdc25d6f20a7a021`
MD5	`a8c5fececa7d995c3a859e86e1ad488d`
BLAKE2b-256	`8a4a4c4e007b871317b3c8cc53f9c3a5650fa974084c047cacc56931eddd1e2b`